CDMA transceiver techniques for wireless communications

ABSTRACT

The present invention is related to a method for multi-user wireless transmission of data signals in a communication system having at least one base station and at least one terminal. It comprises, for a plurality of users, the following steps: 
     adding robustness to frequency-selective fading to the data to be transmitted.  
     performing spreading and scrambling of at least a portion of a block of data, obtainable by grouping data symbols by demultiplexing using a serial-to-parallel operation,  
     combining (summing) spread and scrambled portions of the blocks of at least two users,  
     adding transmit redundancy to the combined spread and scrambled portions, and  
     transmitting the combined spread and scrambled portions with transmit redundancy.

FIELD OF THE INVENTION

[0001] The present invention is related to a method for Wideband CodeDivision Multiple Access (WCDMA) wireless communication systems,suitable for communication over frequency-selective fading channels.

INTRODUCTION TO THE STATE OF THE ART

[0002] Wideband CDMA is emerging as the predominant wireless access modefor forthcoming 3G systems, because it offers higher data rates andsupports larger number of users over mobile wireless channels comparedto access techniques like TDMA and narrowband CDMA. Especially in thedownlink (from base to mobile station) direction the main drivers towardfuture broadband cellular systems require higher data rates. There areseveral main challenges to successful transceiver design. First, forincreasing data rates, the underlying multi-path channels become moretime-dispersive, causing Inter-Symbol Interference (ISI) and Inter-ChipInterference (ICI), or equivalently frequency-selective fading. Second,due to the increasing success of future broadband services, more userswill try to access the common network resources, causing Multi-UserInterference (MUI). Both ISI/ICI and MUI are important performancelimiting factors for future broadband cellular systems, because theydetermine their capabilities in dealing with high data rates and systemloads, respectively. Third, cost, size and power consumption issues putsevere constraints on the receiver complexity at the mobile.

[0003] Direct-Sequence (DS) Code Division Multiple Access (CDMA) hasemerged as the predominant air interface technology for the 3G cellularstandard, because it increases capacity and facilitates network planningin a cellular system. DS-CDMA relies on the orthogonality of thespreading codes to separate the different user signals. However, ICIdestroys the orthogonality among users, giving rise to MUTI. Since theMUI is essentially caused by the multi-path channel, linear chip-levelequalization, combined with correlation with the desired user'sspreading code, allows to suppress the MUI. However, chip equalizerreceivers suppress MUI only statistically, and require multiple receiveantennas to cope with the effects caused by deep channel fades.

[0004] Multiple Input Multiple Output (MIMO) systems with severaltransmit and several receive antennas are able to realize a capacityincrease in rich scattering environments. Space-Time coding is animportant class of MIMO communication techniques that achieve a highquality-of-service over frequency-flat flat fading channels byintroducing both temporal and spatial correlation between thetransmitted signals. It has already been combined with single-carrierblock transmission to achieve maximum diversity gains overfrequency-selective fading. Up till now however focus was mainly onsingle-user point-to-point communication links.

DETAILED STATE OF THE ART

[0005] The main drivers toward future broadband cellular systems, likehigh-speed wireless internet access and mobile multimedia, require muchhigher data rates in the downlink (from base to mobile station) than inthe uplink (from mobile to base station) direction. Given the asymmetricnature of most of these broadband services, the capacity and performancebottlenecks clearly reside in the downlink of these future systems.Broadband cellular downlink communications poses three main challengesto successful transceiver design. First, for increasing data rates, theunderlying multi-path channels become more time-dispersive, causingInter-Symbol Interference (ISI) and Inter-Chip Interference (ICI), orequivalently frequency-selective fading. Second, due to the increasingsuccess of future broadband services, more users will try to access thecommon network resources, causing Multi-User Interference (MUI). BothISI/ICI and MUI are important performance limiting factors for futurebroadband cellular systems, because they determine their capabilities indealing with high data rates and system loads, respectively. Third,cost, size and power consumption issues put severe constraints on thereceiver complexity at the mobile.

[0006] Multi-Carrier (MC) CDMA has recently gained increased momentum ascandidate air interface for future broadband cellular systems, becauseit combines the advantages of CDMA with those of Orthogonal FrequencyDivision Multiplexing (OFDM). Indeed, OFDM enables high data ratetransmissions missions by combatting ISI in the frequency-domain. Threedifferent flavours of MC-CDMA exist, depending on the exact position ofthe CDMA and the OFDM component in the transmission scheme. The firstvariant, called MC-CDMA, performs the spreading operation before thesymbol blocking (or serial-to-parallel conversion), which results in aspreading of the information symbols across the different subcarriers.However, like classical DS-CDMA, MC-CDMA does not exploit full frequencydiversity gains, and requires receive diversity to ameliorate deal withthe effects caused by deep channel fades. The second variant, calledMC-DS-CDMA, executes the spreading operation after the symbol blocking,resulting in a spreading of the information symbols along the time axisof the different subcarriers carriers. However, like classical OFDM,MC-DS-CDMA necessitates bandwidth consuming Forward Error Correction(FEC) coding plus frequency-domain interleaving to mitigatefrequency-selective fading. Finally, Multi-Tone (MT) DS-CDMA, performsthe spreading after the OFDM modulation, such that the resultingspectrum of each subearrier no longer satisfies the orthogonalitycondition. Hence, MT-DS-CDMA suffers from ISI, Inter-Tone Interference(ITI), as well as MUI, and requires expensive multi-user detectiontechniques to achieve a reasonable performance. Alternatively, MUI-freeMC transceivers, like AMOUR and Generalized Multi-Carrier (GMC) CDMA [G.B. Giannakis, Z. Wang, A. Scaglione, and S. Barbarossa,“AMOUR—Generalized Multicarrier Transceivers for Blind CDMA Regardlessof Multipath”, IEEE Transactions on Communications, vol. 48, no. 12, pp.2064-2076, December 2000], rely on Orthogonal Frequency DivisionMultiple Access (OFDMA) to retain the orthogonality among users,regardless of the multi-path channel. However, they do not inherit thenice advantages of CDMA related to universal frequency reuse in acellular network, like increased capacity and simplified networkplanning.

[0007] Space-time coding techniques, that introduce both temporal andspatial correlation between the transmitted signals, are capable ofsupporting reliable high-data-rate communications without sacrificingprecious bandwidth resources. Originally developed for frequency-flatfading channels, these techniques have recently been extended forfrequency-selective fading channels [Y. Li, J. C. Chuang, and N. R.Sollenberger, “Transmitter diversity for OFDM systems and its impact onhigh-rate data wireless networks”, IEEE Journal on Selected Areas inCommunications, vol. 17, no. 7, pp. 1233-1243, July 1999]. However, uptill now focus was mainly on point-to-point communication links, therebyneglecting the multiple access technique in the design of thetransmission scheme.

[0008] Direct-Sequence (DS) Code Division Multiple Access (CDMA) hasemerged as the predominant air interface technology for the 3G cellularstandard, because it increases capacity and facilitates network planningin a cellular system, compared to convential multiple access techniqueslike Frequency Divisin Multiple Access (FDMA) and Time Division MultipleAccess (TDMA). Traditional Code Division Multiple Access (CDMA) systems,that employ a single antenna at both ends of the wireless link, rely onthe orthogonality of the spreading codes to separate the different usersignals in the downlink. For increasing chip rates the time-dispersivenature of the multi-path channel destroys however the orthogonalitybetween the user signals, giving rise to Multi-User Interference (MUI).Single-Carrier Block Transmission (SCBT) DS-CDMA retains theorthogonality of the spreading codes regardless of the underlyingmulti-path channel. However, the spectral efficiency and hence the userdata rate of SCBT-DS-CDMA systems is limited by the receivedsignal-to-noise ratio. On the other hand, Multiple Input Multiple Output(MIMO) systems that employ M_(t) transmit and M_(r) receive antennas,realize an M_(min)-fold capacity increase in rich scatteringenvironments, where M_(min)=min {M_(t), M_(r)} is called themultiplexing gain. Space-Time Block Coding (STBC) is an important classof MIMO communication techniques that achieve high Quality of Service(QoS) over frequency-flat fading channels by introducing both temporaland spatial correlation between the transmitted signals. Time Reversal(TR) STBC has recently been combined with Single-Carrier BlockTransmission (SCBT) to achieve maximum diversity gains overfrequency-selective fading channels [N. Al-Dhahir, “Single-carrierfrequency-domain equalization for space-time block-coded transmissionsover frequency-selective fading channels”, IEEE ComrnmunicationsLetters, vol. 5. no. 7, pp. 304-306, July 2001]. However, up till nowfocus was mainly on single-user point-to-point communication links,therby neglecting the multiple access technique in the design of thetransmission scheme.

[0009] Code Division Multiple Access (CDMA) systems rely on theorthogonality of the spreading codes to separate the different usersignals in the downlink. For increasing chip rates the time-dispersivenature of the multi-path channel destroys however the orthogonalitybetween the users, giving rise to multi-user interference (MUI). Forconventional Direct-Sequence (DS) CDMA systems, chip-level equalizationhas been shown to completely or partially restore the orthogonality andsuppress the MUI. However, DS-CDMA chip-level equalization requiresmultiple receive antennas (two for chip rate sampling in a single-cellcontext) to guarantee a Zero-Forcing (ZF) solution under some weakconstraints on the channel. On the other hand, Single-Carrier BlockTransmission (SCBT) DS-CDMA, leading to a Chip-Interleaved Block-Spread(CIBS) CDMA transmission [S. Zhou, G. B. Giannakis, and C. Le Martret,“Chip-Interleaved Block-Spread Code Division Multiple Access”, IEEETransactions on Conmnunications, vol. 50, no. 2, pp. 235-248, February2002] , only requires a single receive antenna because it effectivelydeals with the frequency-selectivity of the channel through Zero Padding(ZP) the chip blocks. Moreover, SCBT-DS-CDMA preserves the orthogonalityof the user signals regardless of the underlying multi-path channelwhich enables deterministic Maximum Likelihood (ML) multi-userseparation through low-complexity code-matched filtering. Increasedequalization flexibility and reduced complexity are other benignproperties that favor SCBT-DS-CDMA for broadband downlink transmisssioncompared to conventional DS-CDMA. However, the equalizer coefficientsare calculated based on exact or approximate channel knowledge with thelatter being obtained from either a subspace-based or afinite-alphabet-based channel estimator. This approach leads to a highcomputational burden in a mobile system setup where the time-varyingmulti-path channel requires frequent recalculation of the equalizercoefficients.

[0010] In the downlink of traditional Single Input Single Output (SISO)DS-CDMA systems that employ a single antenna at both ends of thewireless link, the different user signals are synchronously multiplexedwith short orthogonal spreading codes that are user specific and a longoverlay scrambling code that is base station specific. The Multi-UserInterference (MUI) experienced by a particular mobile station isessentially caused by the multi-path channel, that destroys theorthogonality of the user signals. Chip-level equalization followed bydescrambling and despreading effectively copes with the MUI by restoringthe orthogonality between the user signals. Practical training-based andsemi-blind methods for direct chip equalizer estimation exploit thepresence of either code-multiplexed or time-multiplexed pilot symbols.However, the spectral efficiency and hence the user data rate oftraditional SISO DS-CDMA systems is limited by the receivedsignal-to-noise ratio. On the other hand, Multiple Input Multiple Output(MIMO) systems that employ M_(T) transmit and M_(R) receive antennas,realize an M_(min)-fold capacity increase in rich scatteringenvironments, where M_(min)=min {M_(T), M_(R)} is called themultiplexing gain. Spatial multiplexing, a.k.a. BLAST, is a MIMOcommunication technique that achieves high spectral efficiencies bytransmitting independent data streams from the different transmitantennas. Zero Forcing (ZF) and Minimum Mean Squared Error (MMSE)detection algorithms for narrowband point-to-point D-BLAST and V-BLASTcommunication architectures have been considered. These results areextended for wideband point-to-multipoint MIMO channels in [H. Huang, H.Viswanathan, and G. J. Foschini, “Achieving high data rates in CDMAsystems using BLAST techniques”, Proceedings of GLOBECOM, November 1999,vol. 5, pp. 2316-2320, IEEE], combining DS-CDMA with BLAST techniquesfor the cellular downlink. The receiver structure proposed in there usesa generalization of the V-BLAST algorithm based on the space-timedecorrelating multi-user detector to deal with the MUI. However, sincethis receiver algorithm does not effectively exploit the structure ofthe downlink problem, it comes at a very high cost which can hardly bejustified for the mobile station.

AIMS OF THE INVENTION

[0011] The present invention aims to provide a method and device forWideband Code Division Multiple ple Access (WCDMA) wirelesscommunication that preserves the orthogonality among users andguarantees symbol detection regardless of the frequency-selective fadingchannels.

SUMMARY OF THE INVENTION

[0012] The invention relates to a method for multi-user wirelesstransmission of data signals in a communication system having at leastone base station and at least one terminal. It comprises, for aplurality of users, the following steps:

[0013] adding robustness to frequency-selective fading to said data tobe transmitted,

[0014] performing spreading and scrambling of at least a portion of ablock of data, obtainable by grouping data symbols by demultiplexingusing a serial-to-parallel operation,

[0015] combining (summing) spread and scrambled portions of said blocksof at least two users,

[0016] adding transmit redundancy to said combined spread and scrambledportions, and

[0017] transmitting said combined spread and scrambled portions withtransmit redundancy.

[0018] Preferably the spreading and scrambling operation is performed bya code sequence, obtained by multiplying a user(terminal)-specific codeand a base station specific scrambling code.

[0019] Preceding the steps mentioned above, the step can be performed ofgenerating a plurality of inde- pendent block portions.

[0020] The method can also start with the step of generating blockportions.

[0021] Advantageously all the steps are performed as many times as thereare block portions, thereby generating streams comprising a plurality ofcombined spread and scrambled block portions.

[0022] In a specific embodiment, between the step of combining and thestep of transmitting said spread and scrambled portions, the step iscomprised of encoding each of said streams.

[0023] More specifically, the step is comprised of space-time encodingsaid streams, thereby combining info from at least two of said streams.

[0024] Even more specifically the step of space-time encoding thestreams is performed by block space-time encoding or trellis space-timeencoding. For further details about trellis coding, reference is made toU.S. patent application No. U.S. Ser. No. 09/507,545, U.S. Ser. No.10/354,262, U.S. Ser. No. 10/151,700, filed respectively on Feb. 18,2000, Feb. 28, 2003 and May 17, 2002 which are hereby incorporated inits/their entirety by reference.

[0025] In an alternative embodiment, the step of inverse subbandprocessing is comprised between the step of combining and the step oftransmitting the spread and scrambled portions.

[0026] In an advantageous embodiment the step of adding robustness tofrequency-selective fading is performed by adding linear precoding.

[0027] Alternatively, the step of adding robustness tofrequency-selective fading is performed by applying adaptive loading peruser.

[0028] In a typical embodiment the step of combining spread andscrambled block portions includes the summing of a pilot signal.

[0029] In the method of the invention the step of adding transmitredundancy comprises the addition of a cyclic prefix, a zero postfix ora symbol postfix.

[0030] The invention also relates to a transmit system device forwireless multi-user communication, applying the method here described.

[0031] Another object of the invention relates to a transmit apparatusfor wireless multi-user communication, comprising:

[0032] Circuitry for grouping data symbols to be transmitted,

[0033] Means for applying a spreading and scrambling operation to saidgrouped data symbols,

[0034] Circuitry for add transmit redundancy to said spread andscrambled grouped data symbols,

[0035] At least one transmit antenna for transmitting said spread andscrambled grouped data symbols with transmit redundancy.

[0036] In a specific embodiment the transmit apparatus also comprisesmeans for adding robustness to frequency-selective fading to the groupeddata symbols.

[0037] In a preferred embodiment the transmit apparatus also comprises aspace-time encoder.

[0038] In another preferred embodiment the transmit apparatus alsocomprises circuits for inverse subband processing.

[0039] The invention also relates to a method for receiving at least onesignal in a multi-user wireless communication system having at least onebase station and at least one terminal, comprising the steps of

[0040] Receiving a signal from at least one antenna,

[0041] Subband processing of a version of said received signal,

[0042] Separating the contributions of the various users in saidreceived signal,

[0043] Exploiting the additional robustness to frequency-selectivefading property of said received signal.

[0044] In a particular embodiment the step of separating thecontributions consists in first filtering at chip rate at least aportion of the subband processed version of said received signal andthen despreading.

[0045] In another particular embodiment the step of separating thecontributions consists in first despreading and then filtering at leasta portion of the subband processed version of said received signal.

[0046] In a typical embodiment the step of receiving a signal isperformed for a plurality of antennas, thereby generating data streamsand wherein the step of subband processing is performed on each of saiddata streams, yielding a subband processed version of said receivedsignal.

[0047] In a specific embodiment the additional step of space-timedecoding is performed on each of the streams.

[0048] To be even more precise the step of space-time decoding can beperformed by block decoding or trellis decoding.

[0049] In another embodiment the additional step of inverse subbandprocessing is performed on at least one filtered, subband processedversion of the received signal.

[0050] Preferably the step of filtering is carried out by a filter ofwhich the coefficients are determined in a semi-blind fashion or in atraining-based way.

[0051] In another embodiment the step of filtering is carried out by afilter of which the filter coefficients are determined without channelestimation.

[0052] Advantageously the step of filtering at chip rate is carried outby a filter of which the filter coefficients cients are determined suchthat one version of the filtered signal is as close as possible to aversion of the pilot symbol.

[0053] More in particular, the version of the filtered signal is thefiltered signal after despreading with a composite code of the basestation specific scrambling code and the pilot code and wherein theversion of the pilot symbol is the pilot symbol itself, put in per toneordering.

[0054] In another particular embodiment the version of the filteredsignal is the filtered signal after projecting jecting on the orthogonalcomplement on the subspace spanned by the composite codes of the basestation specific scrambling code and the user specific codes. Theversion of the pilot symbol is the pilot symbol spread with a compositecode of the base station specific scrambling code and the pilot code,and put in per tone ordering.

[0055] Typically, the additional step of removing transmit redundancy isperformed.

[0056] In a particular embodiment the additional robustness to fading isexploited by linear de-precoding.

[0057] The invention also relates to a receive system device forwireless multi-user communication, applying the method as describedabove.

[0058] Another object of the invention relates to a receiver apparatusfor wireless multi-user communication, comprising:

[0059] A plurality of antennas receiving signals,

[0060] A plurality of circuits adapted for subband processing of saidreceived signals,

[0061] Circuitry being adapted for determining by despreading anestimate of subband processed symbols received by at least one user.

[0062] In an embodiment the circuitry adapted for determining anestimate of symbols comprises a plurality of circuits for inversesubband processing.

[0063] In a specific embodiment the circuitry adapted for determining anestimate of symbols further comprises a plurality of filters to filterat least a portion of a subband processed version of said receivedsignals.

[0064] Even more specifically the filtering is performed at chip rate.

[0065] Finally, the apparatus further comprises a space-time decoder.

SHORT DESCRIPTION OF THE DRAWINGS

[0066]FIG. 1 represents a telecommunication system in a single-cellconfiguration.

[0067]FIG. 2 represents a telecommunication system in a multiple-cellconfiguration.

[0068]FIG. 3 represents a block diagram of a receiver structure.

[0069]FIG. 4 represents a block diagram of a transmitter structure.

[0070]FIG. 5 represents the Multi-Carrier Block-Spread CDMA downlinktransmission scheme.

[0071]FIG. 6 represents the MUI-resilient MCBS-CDMA downlink receptionscheme.

[0072]FIG. 7 represents the Space-Time Block Coded MCBS-CDMA downlinktransmission scheme.

[0073]FIG. 8 represents the MUI-resilient STBC/MCBS-CDMA MIMO receptionscheme.

[0074]FIG. 9 represents a comparison of Linear versus Decision FeedbackJoint Equalization and Decoding.

[0075]FIG. 10 represents a comparison of Separate versus Joint LinearEqualization and Decoding.

[0076]FIG. 11 represents a comparison of DS-CDMA and MCBS-CDMA for smallsystem load.

[0077]FIG. 12 represents a comparison of DS-CDMA and MCBS-CDMA for largesystem load.

[0078]FIG. 13 represents the STBC/MCBS-CDMA performance for channelorder L_(c)=1.

[0079]FIG. 14 represents the STBC/MCBS-CDMA performance for channelorder L_(c)=3.

[0080]FIG. 15 represents the transmitter model for Space-Time CodedMC-DS-CDMA with Linear Precoding.

[0081]FIG. 16 represents the receiver model for Space-Time CodedMC-DS-CDMA with Linear Precodmg.

[0082]FIG. 17 represents the performance comparison of the differentequalizers without linear preceding.

[0083]FIG. 18 represents the performance comparison of the differentequalizers with linear precoding.

[0084]FIG. 19 represents the transmitter model for Space-Time CodedSCBT-DS-CDMA.

[0085]FIG. 20 represents the receiver model for Space-Time CodedSCBT-DS-CDMA.

[0086]FIG. 21 represents the performance of ST Coded SCBT-DS-CDMA forhalf system load.

[0087]FIG. 22 represents the performance of ST Coded SCBT-DS-CDMA forfull system load.

[0088]FIG. 23 represents the base station transmitter model forSCBT-DS-CDMA with KSP.

[0089]FIG. 24 represents the mobile station receiver model forSCBT-DS-CDMA with KSP.

[0090]FIG. 25 represents the SCBT-DS-CDMA/KSP equalizer performance forlarge burst length.

[0091]FIG. 26 represents the SCBT-DS-CDMA/KSP equalizer performance forsmall burst length.

[0092]FIG. 27 represents the base station transmitter model forspatially multiplexed DS-CDMA.

[0093]FIG. 28 represents the mobile station receiver model with initiallinear stage and K non-linear ST-PIC/RAKE stages.

[0094]FIG. 29 represents the initial linear stage based on space-timechip-level equalization.

[0095]FIG. 30 represents the k-th PIC/RAKE stage focused on the j-thtransmit stream of the l-th user.

[0096]FIG. 31 shows that One ST-PIC/RAKE stage achieves a 6.5 dB gainfor the SB-ST-CLEQ.

[0097]FIG. 32 shows that a second ST-PIC/RAKE stage offers an additional1.5 dB gain for the SB-ST-CLEQ.

DETAILED DESCRIPTION OF THE INVENTION

[0098] In the invention methods for W-CDMA wireless communicationbetween devices and the related devices are presented (FIG. 1). In thecommunication system at least data (10) is transmitted from at least onebase station (100) to at least one terminal (200). The communicationmethod is extendable to a case with a plurality of base stations (FIG.2, 100, 101), each base station being designed for covering a singlecell (FIG. 2, 20, 21) around such base station. In such multiple basestation and hence multicell case a terminal receives typically signalsfrom both the most nearby base station and other base stations. Withinthe method it is assumed that the base station has at least one antenna(110) and the terminal also has at least one physical antenna (210). Thecommunication between the base station(s) and the terminal is designedsuch that said communication is operable in a context with multipleterminals. Hence it is assumed that substantially simultaneouscommunication between said base station(s) and a plurality of terminalsis taking place, while still being able to distinguish at the terminalside which information was intended to be transmitted to a dedicatedterminal.

[0099] The notion of a user is introduced. It is assumed that with eachterminal in such a multi-terminal context at least one user isassociated. The invented communication method and related devicesexploit spreading with orthogonal codes as method for separatinginformation streams being associated with different users. Hence at thebase station side information, more in particular data symbols, ofdifferent users, hence denoted user specific data symbols are available.After spreading spread user specific data symbols are obtained. Thesespread user specific data symbols are added, leading to a sum signal ofspread user specific data symbols. Further additional scrambling of saidsum signal is performed by a scrambling code being base stationspecific. Symbols obtained after spreading or spreading and scrambling,and summing are denoted chip symbols. In the invented communicationmethod blocks with a plurality of said chip symbols are transmitted. Ina single base station case the transmitted signal thus comprises aplurality of time overlapped coded signals, each coded signal beingassociated to an individual user and distinguishable only by a userspecific encoding, based on the user signature or spreading codes. In amultiple base station context, the distinguishing also exploits saidbase station specific code. Further such blocks have at least one pilotsymbol (also called training sequence), being predetermined and known atboth sides of the transmission link.

[0100] In many embodiments of the invented method the availability of areceiver being capable of generating at least two independent signalsfrom a received signal is foreseen. Said receiver receives aspread-spectrum signal, corresponding to a superposition of the signalsof all users active in the communication system or link, more inparticular said superposition of signals is channel distorted. Saidgeneration of at least two independent signals can be obtained by havingat least two antennas at the terminal, each independent signal being thesignal received at such antenna after the typical down-converting andfiltering steps.

[0101] In case of a single antenna terminal, polarization diversity ofsaid single antenna can be exploited or the temporal oversampling of thetransmitted signal can be used. Because of the time-dispersive nature ofthe multi-path channel the independent signals are channel distortedversions of the signal transmitted by the base station(s). Alternativelyit can be stated that the receiver or front-end circuitry provides forsamples, typically complex baseband samples for a plurality of channels(either via different antennas or via polarization diversity oroversampling) in digital format.

[0102] Recall that the invention exploits spreading with orthogonalcodes for separating different users. Unfortunately said channeldistortion is destroying the orthogonality of the used codes, leading toa bad separation. This problem is known as multi-user interference(MUI). Hence it is a requirement for the method of the invention toallow retrieval of a desired user's symbol sequence from a receivedsignal transmitted in a communication context with severe multi-userinterference. An additional aid in achieving this goal can come from theuse of transmit redundancy. Applying transmit redundancy helps to removeor at least to weaken the effect of the time dispersion of themulti-path channel. A well known example of this is the addition of acyclic prefix in a multi-carrier system. The method of the inventioncomprises a step of inputting or receiving said received signal, being achannel distorted version of a transmitted signal comprising a pluralityof user data symbol sequences, each being encoded with a known, userspecific code.

[0103] The multi-channel propagation also gives rise to multipathfading, which generally exhibits both frequency-selectivity andtime-selectivity. The phenomenon can give rise to serious performancedegradation and constitutes a bottleneck for higher data rates.Frequency-selective fading can be tackled in several ways, as isdiscussed below.

[0104] In the method of the invention the multi-user interference issuppressed by performing operations on the chip symbols. This multi-userinterference suppression is obtained by combining said independentsignals resulting in a combined filtered signal. In an embodiment of theinvention said combining, also denoted chip-level equalization, is alinear space-time combining. For said combining ing a combiner filter(chip-level equalizer) is used. The (chip-level equalization) filtercoefficients of said combiner filter are determined directly from saidindependent signals, hence without estimating the channelcharacteristics. One can state that from said independent signals in adirect and deterministic way a chip-level equalization filter isdetermined. Said chip-level equalization filter is such that saidtransmitted signal is retrieved when applying said filter to saidreceived signal.

[0105] In the approach of the invention, all system parameters arechosen such that the orthogonality between the various users ismaintained, i.e., the MUI is combated in the most efficient way. Toobtain that goal, orthogonal user specific spreading codes are used, ablock spreading operation is applied to the symbols, and transmitredundancy is added such that the time dispersion is absorbedsufficiently. A block spreading operation is realized by converting asymbol sequence into blocks by a serial-to-parallel conversion. In thisway, the various users can properly be decoupled. In order to enhanceeach user's robustness against deep fading effects, one additionallyapplies techniques like linear precoding (as illustrated by (1)) oradaptive loading. The linear precoder is selected to guarantee symboldetectability (for instance by setting a condition as given by (14)).

[0106] In case channel state information is available at thetransmitter, e.g., for stationary or low-speed users, multicarriertransmission allows to apply adaptive loading to exploit the inherentfrequency diversity of the channel without adding extra redundancy.Since the different users are perfectly decoupled, adaptive loading canbe performed on a per user basis, such that for every user the optimaltransmit spectrum can be obtained without having to bother about thepresence of other users. In specific, adaptive loading assigns moreinformation (through higher order constellations) and more power to thegood subcarriers (with a high channel gain) while less information(through lower order constellations) and less power, or even none atall, is assigned to the bad subcarriers (with a low channel gain).

[0107] In case no channel state information is available at thetransmitter, e.g., for medium- to high-speed users, linear precoding canbe applied to robustify the transmission against frequency-selectivefading. Specifically, at the transmitter, the information symbols arelinearly precoded on a subset of subcarriers, while adding somewell-defined redundancy, such that the information symbols can always beuniquely recovered from this subset, even if some of the subcarriers arein a deep channel fade. At the receiver, the available frequencydiversity is then exploited by performing either joint or separateequalization and decoding.

[0108]FIG. 3 shows a general scheme of a receiver system. The elementspresented there are used in various combinations in the embodimentsdescribed below. One or preferably several antennas (210) are foreseenfor receiving signals. Next circuits (400) are provided to apply asubband processing to said received signals. The subband processedreceive signals are then applied to a block (500) adapted fordetermining an estimate of the data symbols of at least one user. Herebyuse can be made of several other functional blocks, performing the tasksof inverse subband processing (510), removing robustness added to thetransmitted symbols in order to provide protection againstfrequency-selective fading (520), filtering (530) and/or despreading(540). Which operations are effectively applied and in which orderhighly depends on the specific embodiment. More details are given below.

[0109]FIG. 4 shows a general scheme of a transmitter system, comprisingcircuitry for at least grouping data symbols into blocks (600), at leastone means (700) (710) for applying a spreading and scrambling operationto said grouped data symbols and transmit circuitry (750) and at leastone transmit antenna (800) (810). Optionally a space-time encoder (900),combining the outcome of a plurality of said spreading and scramblingmeans (700)(710) is available. Said transmit circuitry can furthercomprise of inverse subband processing means.

[0110] Subband processing of a data signal having a data rate, comprisesin principle of splitting said data signal in a plurality of datasignals, with a lower data rate and modulating each of said plurality ofdata signals with another carrier. Said carriers are preferablyorthogonal. In an embodiment said subband processing of a data signalcan be realized by using serial-to-parallel convertors and using atransformation on a group of data samples of said data signal. Forfurther details about subband coding, reference is made to U.S. patentapplication No. U.S. Ser. No. 09/552,150 filed on Apr. 18, 2000, whichis hereby incorporated in its/their entirety by reference. For furtherdetails about an OFDM system, using an IFFT as subband coding and FFTfor inverse subband coding, reference is made to U.S. patent applicationNo. U.S. Ser. No. 09/505,228 filed on Feb. 16, 2000, which is herebyincorporated in its/their entirety by reference.

[0111] In a first embodiment of the invention a multi-carrierblock-spread CDMA transceiver is disclosed that preserves theorthogonality between users and guarantees symbol detection. In thisapproach the M user data symbol sequences are transformed into amulti-user chip sequence. Apart from the multiplexing and the inversesubband processing three major operations are performed: linearprecoding, block spreading and adding transmit redundancy. Each usersdata symbol sequence is converted into blocks and spread with a userspecific composite code sequence being the multiplication of anorthogonal spreading code specific to the user and a base stationspecific scrambling code. The chip block sequences of other users areadded and the resulting sum is IFFT transformed to the time domain. Thentransmit redundancy is added to the chip blocks to cope with thetime-dispersive effect of multi-path propagation. The sequence thatcomes out is then transmitted. At the receiver side perfectsynchronization is assumed. In the mobile station of interest theoperations corresponding to those at the transmitter side are performed.The added transmit redundancy is removed and a FFT operation isperformed. The FFT output is despreaded with the desired users compositecode sequence. This operation decouples the various users in the system,i.e. all MUI is succesfully eliminated. For each individual user anequalization filter is then provided. The equalizer filters can bedesigned for jointly equalizing and decoding or for separatelyperforming said operations.

[0112] In a second embodiment of the invention space-time codingtechniques, originally proposed for point-to-point communication links,are extended to point-to-multipoint communication links. The multipleaccess technique in the design of the transmission scheme is therebytaken into account. Each users data symbol sequence is converted intoblocks. The resulting block sequence is then linearly precoded as to addrobustness against frequency-selective fading. The precoded data are puton the various tones of a multi-carrier system. Next the blocks aredemultiplexed into a number of parallel sequences. Each of the sequencesis spread with the same user code sequence, being the multiplication ofthe user specific orthogonal spreading code and the base-stationspecific scrambling code. In each of the parallel streams the differentuser chip block sequences are added up together with the pilot chipblock sequence. Then a block space-time encoding operation takes place.The space-time coding is implemented on each tone separately at the basestation. In stead of a block space-time encoding also a trellisspace-time encoding scheme may be envisaged. The usual multicarrieroperations of inverse fast Fourier transforming and adding a cyclicprefix (by means of a transmit matrix T, as discussed in ParagraphIX-B.1.a) then follow in each stream before transmitting the signal. Thecyclic prefix represents in this case said transmit redundancy. Themobile station of interest at the receiver side is equipped withmultiple receive antennas. The operations corresponding to those at thetransmitter side are performed on each of the received signals, startingwith the cyclic prefix removal and the FFT operation. The space-timeblock decoding operation is performed. The space-time decoded output isnext re-ordered on a tone-per-tone base (for instance by means ofpermutation matrices, as illustrated in (44)). Next per-tone chipequalization is applied. The filter coefficients can be determined in atraining based or in a semi-blind way. In the training-based approachone relies on the knowledge of a pilot symbol. The equalizercoefficients are determined such that the equalized output afterdespreading is as close as possible to a version of the pilot symbol,being the pilot symbol itself, put in per tone ordering. In thesemi-blind approach one relies not only on the knowledge of said pilotsymbol, but also on characteristics of the codes. The equalizer filteroutput is projected on the orthogonal complement on the subspace spannedby the composite codes of the various users (i.e. the codes resultingfrom the multiplication of the base station specific scrambling code andthe user specific codes). This projected output must then be as close aspossible to the pilot symbol spread with a composite code of the basestation specific scrambling code and the pilot code, and put in per toneordering. From the equalizer output the contribution of that specificuser can easily be derived after removing the additional robustnessagainst deep fading. Examples of filter coefficient determination aregiven by formulas (56), (58), (77), (78), wherein closeness isdetermined in terms of a mean squared error norm.

[0113] In a third embodiment the base station again has multipletransmit antennas, but data are transmitted on a single carrier. Thedata symbol sequence is demultiplexed into several streams, whichsubsequently are converted into symbol blocks that are spread with thesame user composite code sequence being the multiplication of the userspecific orthogonal spreading code and the base station specificscrambling code. The pilot chip block sequence is added to the differentuser chip block sequences. The chip blocks output by said encoder arepadded with a zero postfix (by means of a transmit matrix T_(zp) asdiscussed in Paragraph IX-C.1.a), parallel-to-serial converted and sentto the transmit antenna. Said postfix provides transmit redundancy. Thereceiver is again equipped with multiple receive antennas. Suppose themobile station of interest has acquired perfect synchronization. Afterconversion into chip blocks the data on each receive antenna are blockspace-time decoded. Each decoded block is transformed into the frequencydomain. The per receive antenna ordering is transformed into a per toneordering. Then again a per tone chip level equalization can beperformed. After re-transforming to the time domain and removal of thezero postfix, finally yield an estimation of the desired userstransmitted data symbols.

[0114] In a fourth embodiment the base station has a single transmitantenna, whereas the mobile station of interest may have multiplereceive antennas. Each users data symbol sequence is converted intosymbol blocks and spread with the user composite code sequence, beingthe multiplication of a user specific and a base station specificscrambling code. The pilot symbol is treated in the same way. Thedifferent user chip block sequences and the pilot chip block sequenceare added. At the end of each block a number of zeros are padded. Also aknown symbol postfix, the length of which is the same as the number ofzeros padded, is added to each block. After P/S conversion the chipsequence is transmitted. At the receiver side the mobile station ofinterest is equipped with multiple antennas and has acquired perfectsynchronisation. Assuming the known symbol postfix is long enough, thereis no interblock interference present in the received signal. Aftertransformation into the frequency domain a per tone chip equalizerfilter is foreseen for each antenna. By transforming back into the timedomain and removing the known symbol postfix the desired users symbolscan be retrieved.

[0115] In a fifth embodiment only spatial multiplexing is applied and nospace-time block coding. Since spatial multiplexing combined withDS-CDMA leads to an increase of the amount of multi-user interference(as many times as there are transmit antennas), linear chip levelequalization does not suffice to efficiently deal with the induced MUI.The multi-user receiver consists of an initial linear space-timechip-level equalization stage and possibly multiple non-linearspace-time parallel interference cancellation stages with space-timeRAKE combining.

[0116] Below various embodiments of the invention are described. Theinvention is not limited to these embodiments but only by the scope ofthe claims.

A. EMBODIMENT A.1 Transceiver Design

[0117] Given the asymmetric nature of broadband services requiring muchhigher data rates in downlink than in uplink direction, we focus on thedownlink bottleneck of future broadband cellular systems. Our goal is todesign a transceiver that can cope with the three main challenges ofbroadband cellular downlink communications. First, multi-pathpropagation gives rise to time dispersion and frequency-selective fadingcausing ISI and ICI, which limit the maximum data rate of a systemwithout equalization. Second, multiple users trying to access commonnetwork resources may interfere with each other, resulting in MUI, whichupperbounds the maximum user capacity in a cellular system. Specific toDS-CDMA downlink transmission, the MUI is essentially caused bymulti-path propagation, since it destroys the orthogonality of the usersignals. Third, cost, size and power consumption issues put severeconstraints on the receiver complexity at the mobile.

[0118] We consider a single cell of a cellular system with a BaseStation (BS) serving M active Mobile Stations (MSs) within its coveragearea. For now, we limit ourselves to the single-antenna case and deferthe multi-antenna case to Section IX-A.3.

A.1.a Multi-Carrier Block-Spread CDMA Transmission

[0119] The block diagram in FIG. 5 describes the Multi-CarrierBlock-Spread (MCBS) CDMA downlink transmission scheme (where only them-th user is explicitly shown), that transforms the M user data symbolsequences {s^(m)[i]_(m=1) ^(M) into the multi-user chip sequence u[n]with a rate 1/T_(c). Apart from the user multiplexing and the IFFT, thetransmission scheme performs three major operations, namely linearprecoding, block spreading, and adding transmit redundancy. Since ourscheme belongs to the general class of block transmission schemes, them-th user's data symbol sequence s^(m)[i] is first serial-to-parallelconverted into blocks of B symbols, leading to the symbol block sequences^(m)[i]:=[s^(m)[iB], . . . , s^(m)[(i+1)B−1]]^(T). The blocks s^(m)[i]are linearly precoded by a Q×B matrix Θ to yield the Q×1 precoded symbolblocks:

{tilde over (s)} ^(m) [i]:=Θ·s ^(m) [i],  (1)

[0120] where the linear preceding can be either redundant (Q>B) ornon-redundant (Q=B). For conciseness, we limit our discussion toredundant preceding, but the proposed concepts apply equally well tonon-redundant preceding. As we will show later, linear precodingguarantees symbol detection and maximum frequency-diversity gains, andthus robustifies the transmission against frequency-selective fading.Unlike the traditional approach of symbol spreading that operates on asingle symbol, we apply here block spreading that operates on a block ofsymbols. Specifically, the block sequence {tilde over (s)}^(m)[i] isspread by a factor N with the user composite code sequence c^(m)[n],which is the multiplication of a short orthogonal Walsh-Hadamardspreading code that is MS specific and a long overlay scrambling codethat is BS specific. The chip block sequences of the different activeusers are added, resulting into the multi-user chip block sequence:$\begin{matrix}{{{\overset{\sim}{x}\lbrack n\rbrack} = {\sum\limits_{m = 1}^{M}\quad {{{\overset{\sim}{s}}^{m}\lbrack i\rbrack}{c^{m}\lbrack n\rbrack}}}},} & (2)\end{matrix}$

[0121] where the chip block index n is related to the symbol block indexi by: n=iN+n′, n′∈{0, . . . , N−1}. As will become apparent later, blockspreading enables MUI-resilient reception, and thus effectively dealswith the MUI. Subsequently, the Q×Q IFFT matrix F_(Q) ^(H) transformsthe Frequency-Domain (FD) chip block sequence {tilde over (x)}[n] intothe Time-Domain (TD) chip block sequence: x[n]=F_(Q) ^(H)·{tilde over(x)}[n]. The K×Q transmit matrix T, with K≧Q adds some redundancy to thechip blocks x[n]: u[n]:=T·x[n]. As will be clarified later, thistransmit redundancy copes with the time-dispersive effect of multi-pathpropagation and also enables low-complexity equalization at thereceiver. Finally, the resulting transmitted chip block sequence u[n] isparallel-to-serial converted into the corresponding scalar sequence[u[nK], . . . , u[(n+1)K−1]]^(T):=u[n] and transmitted over the air at arate 1/T_(c).

A.1 .b Channel Model

[0122] Adopting a discrete-time baseband equivalent model, thechip-sampled received signal is a channel-distorded version of thetransmitted signal, and can be written as: $\begin{matrix}{{{v\lbrack n\rbrack} = {{\sum\limits_{l = 0}^{L_{c}}\quad {{h\lbrack l\rbrack}{u\left\lbrack {n - l} \right\rbrack}}} + {w\lbrack n\rbrack}}},} & (3)\end{matrix}$

[0123] where K[l] is the chip-sampled FIR channel that models thefrequency-selective multi-path propagation between the transmitter andthe receiver including the effect of transmit and receive filters, L_(c)is the order of K[l], and w[n] denotes the additive gaussian noise,which we assume to be white with variance σ_(w) ². Furthermore, wedefine L as a known upperbound on the channel order: L≧L_(c), which canbe well approximated by${L \approx {\left\lfloor \frac{\tau_{\max}}{T_{c}} \right\rfloor + 1}},$

[0124] where τ_(max) is the maximum delay spread within the givenpropagation environment.

A.1 .c MUI-Resilient Reception

[0125] The block diagram in FIG. 6 describes the reception scheme forthe MS of interest (which we assume to be the m-th one), whichtransforms the received sequence v[n] into an estimate of the desireduser's data symbol sequence ŝ^(m)[i]. Assuming perfect synchronization,the received sequence v[n] is serial-to-parallel converted into itscorresponding block sequence v[n]:=[v[nK], . . . , v[(n+1)K−1]]^(T).From the scalar input/output relationship in (3), we can derive thecorresponding block input/output relationship:

v[n]=H[0]−u[n]+H[1]·u[n−1]+w[n],  (4)

[0126] where w[n]:=[w[nK], . . . , w[(n+1)K−1]]^(T) is the noise blocksequence, H[0] is a K×K lower triangular Toeplitz matrix with entries[H[0]]_(p,q)=h[p−q], and H[1] is a K×K upper triangular Toeplitz matrixwith entries [H[1]]_(p,q)=K[K+p−q]. The time-dispersive nature ofmulti-path propagation gives rise to so-called Inter-Block Interference(IBI) between successive blocks, which is modeled by the second term in(4). The Q×K receive matrix R again removes the redundancy from theblocks v[n]: y[n]:=R·v[n]. The purpose of the transmit/receive pair (T,R) is twofold. First, it allows for simple block by block processing byremoving the IBI. Second, it enables low-complexity frequency-domainequalization by making the linear channel convolution to appearcirculant to the received block. To guarantee perfect IBI removal, thepair (T, R) should satisfy the following condition:

R·H[1]·T=0.   (5)

[0127] To enable circulant channel convolution, the resulting channelmatrix {dot over (H)}:=R·H[0]·T should be circulant. In this way, weobtain a simplified block input/output relationship in the TD:

y[n]={dot over (H)}·x[n]+z[n],  (6)

[0128] where z[n]:=R·w[n] is the corresponding noise block sequence. Ingeneral, two options for the pair (T, R) exist that satify the aboveconditions. The first option corresponds to Cyclic Prefixing (CP) inclassical OFDM systems, and boils down to choosing K=Q+L, and selecting:

[0129]T=T _(cp) :=[I _(cp) ^(T) , I _(Q) ^(T)]^(T) , R=R _(cp):=[0_(Q×L), I _(Q)],  (7)

[0130] where I_(cp) consists of the last L rows of I_(Q). The circulantproperty is enforced at the transmitter by adding a cyclic prefix oflength L to each block. Indeed, premultiplying a vector with T_(cp)copies its last L entries and pastes them to its top. The IBI is removedat the receiver by discarding the cyclic prefix of each received block.Indeed, premultiplying a vector with R_(cp) deletes its first L entriesand thus satisfies (5).

[0131] The second option corresponds to Zero Padding (ZP), and boilsdown to setting K=Q+L, and selecting:

T=T _(zp) :=[I _(Q) ^(T), 0_(Q×L) ^(T)]^(T) , R=R _(zp) :=οI _(Q) , I_(zp)],  (8)

[0132] where I_(zp) is formed by the first L columns of I_(Q). Unlikeclassical OFDM systems, here the IBI is entirely dealt with at thetransmitter. Indeed, premultiplying a vector with T_(zp) pads L trailingzeros to its bottom, and thus satisfies (5). The circulant property isenforced at the receiver by time-aliasing each received block. Indeed,premultiplying a vector with R_(zp) adds its last L entries to its firstL entries.

[0133] Referring back to (6), circulant matrices possess a nice propertythat enables simple per-tone equalization in the frequency-domain.

Property 1

[0134] Circulant matrices can be diagonalized by FFT operations:

{dot over (H)}=F _(Q) ^(H)H·{tilde over (H)}·F_(Q),  (9)

[0135] with${{\overset{\sim}{H}\text{:}} = {{diag}\left( \overset{\sim}{h} \right)}},{\overset{\sim}{h}:=\left\lbrack {{H\left( ^{j\quad 0} \right)},{H\left( ^{j\quad \frac{2\quad \pi}{Q}} \right)},\quad \ldots \quad,{H\left( ^{j\frac{2\quad \pi}{Q}{({Q - 1})}} \right)}} \right\rbrack}$

[0136] the FD channel response evaluated on the FFT grid, H(z):=Σ_(l=0)^(L)h[l]z^(−l) the z-transform of K[l], and F_(Q) the Q×Q FFT matrix.Aiming at low-complexity FD processing, we transform y[n] into the FD bydefining {tilde over (y)}[n]:=F_(Q)·y[n]. Relying on Property 1, thisleads to the following FD block input/output relationship:

{tilde over (y)}[n]={tilde over (H)}·{tilde over (x)}[nn]+{tilde over(z)}[n]  (10)

[0137] where {tilde over (z)}[n]:=F_(Q)·z[n] is the corresponding FDnoise block sequence. Stacking N consecutive chip blocks {tilde over(y)}[n] into {tilde over (Y)}[i]:=[{tilde over (y)}[iN], . . . , {tildeover (y)}[(i+1)N−1]], we obtain the symbol block level equivalent of(10):

{tilde over (Y)}[i]={tilde over (H)}·{tilde over (X)}[i]+{tilde over(Z)}[i],  (11)

[0138] where {tilde over (X)}[i] and {tilde over (Z)}[i] are similarlydefined as {tilde over (Y)}[i]. From (2), we also have that:$\begin{matrix}{{{\overset{\sim}{X}\lbrack i\rbrack} = {\sum\limits_{m = 1}^{M}\quad {{{\overset{\sim}{s}}^{m}\lbrack i\rbrack} \cdot {c^{m}\lbrack i\rbrack}^{T}}}},} & (12)\end{matrix}$

[0139] where c^(m)[i]:=[c^(m)[N], . . . , c^(m)[(i+1)N−₁]]^(T) is them-th user's composite code vector used to block spread its data symbolblock {tilde over (s)}^(m)[i]. By inspecting (11) and (12), we canconclude that our transceiver preserves the orthogonality among users,even after propagation through a (possibly unknown) frequency-selectivemulti-path channel. This property allows for deterministic MUIelimination through low-complexity code-matched filtering. Indeed, byblock despreading (11) with the desired user's composite code vectorc^(m)[i] (we assume the m-th user to be the desired one), we obtain:

{tilde over (y)} ^(m)[i]:={tilde over (Y)}[i]·{tilde over(c)}^(m)[i]*={tilde over (H)}·Θs^(m) [i]+{tilde over (z)} ^(m)[i],  (13)

[0140] where {tilde over (z)}^(m)[i]:={tilde over (Z)}[i]·c^(m)[i]* isthe corresponding noise block sequence. Our transceiver succesfullyconverts (through block despreading) a multi-user chip blockequalization problem into an equivalent single-user symbol blockequalization problem. Moreover, the operation of block despreadingpreserves Maximum-Likelihood (ML) optimality, since it does not incurany information loss regarding the desired user's symbol block s^(m)[i].

A.1.d Single-User Equalization

[0141] After succesfull elimination of the MUI, we still need to detectthe desired user's symbol block s^(m)[i] from (13). Ignoring for themoment the presence of Θ (or equivalently setting Q=B and selectingΘ=I_(Q)), this requires {tilde over (H)} to have full column rank Q.Unfortunately, this condition only holds for channels that do not invokeany zero diagonal entries in {tilde over (H)}. In other words, if the MSexperiences a deep channel fade on a particular tone (corresponding to azero diagonal entry in {tilde over (H)}), the information symbol on thattone can not be recovered. To guarantee symbol detectability of the Bsymbols in s^(m)[i], regardless of the symbol constellation, we thusneed to design the precoder θ such that:

rank({tilde over (H)}·Θ)=B,  (14)

[0142] irrespective of the underlying channel realization. Since an FIRchannel of order L can invoke at most L zero diagonal entries in {tildeover (H)}, this requires any Q−L=B rows of Θ to be linearly independent.Two classes of precoders have been constructed that satisfy thiscondition and thus guarantee symbol detectability or equivalently enablefull frequency-diversity gain, namely the Vandermonde precoders and thecosine precoders. For instance, a special case of the general cosineprecoder is a truncated Discrete Cosine Transform (DCT) matrix.

A.2 Equalization Options

[0143] In this section, we discuss different options to performequalization and decoding of the linear precoding, either jointly orseparately. These options allow to trade-off performance versuscomplexity, ranging from optimal Maximum-Likelihood (ML) detection withexponential complexity to linear and decision-directed detection withlinear complexity. To evaluate the complexity, we distinguish betweenthe initialization phase, where the equalizers are calculated, and thedata processing phase, where the actual equalization takes place. Therate of the former is related to the channel's fading rate, whereas thelatter is executed continuously at the symbol block rate.

A.2.a ML Detection

[0144] The ML algorithm is optimal in a Maximum Likelihood sense, buthas a very high complexity. The likelihood function of the receivedblock {tilde over (y)}^(m)[i], conditioned on the transmitted blocks^(m)[i], is given by: $\begin{matrix}{{p\left( {{{\overset{\sim}{y}}^{m}\lbrack i\rbrack}{s^{m}\lbrack i\rbrack}} \right)} = {\frac{1}{\left( {\pi \quad \sigma_{w}^{2}} \right)^{Q}}{{\exp \left( {- \frac{{{{{\overset{\sim}{y}}^{m}\lbrack i\rbrack} - {\overset{\sim}{H} \cdot \Theta \cdot {s^{m}\lbrack i\rbrack}}}}^{2}}{\sigma_{w}^{2}}} \right)}.}}} & (15)\end{matrix}$

[0145] Amongst all possible transmitted blocks, the ML algorithm retainsthe one that maximizes the likelihood function or, equivalently,minimizes the Euclidean distance: $\begin{matrix}{{{\hat{\underset{\_}{s}}}^{m}\lbrack i\rbrack} = {\arg \quad {\min\limits_{{s^{m}{\lbrack i\rbrack}} \in S}{{{{{\overset{\sim}{y}}^{m}\lbrack i\rbrack} - {\overset{\sim}{H} \cdot \Theta \cdot {s^{m}\lbrack i\rbrack}}}}^{2}.}}}} & (16)\end{matrix}$

[0146] In other words, the ML metric is given by the Euclidean distancebetween the actual received block and the block that would have beenreceived if a particular symbol block had been transmitted in anoiseless environment. The number of possible transmit vectors in _(Q)is the cardinality of _(Q), i.e. |_(Q)|=M^(B), with M the constellationsize. So, the number of points to inspect during the data processingphase grows exponentially with the initial block length B. Hence, thisalgorithm is only feasible for a small block length B and a smallconstellation size M. Note that the ML algorithm does not require aninitialization phase.

A.2.b Joint Linear Equalization and Decoding

[0147] Linear equalizers that perform joint equalization and decodingcombine a low complexity with medium performance. A first possiblity isto apply a Zero-Forcing (ZF) linear equalizer:

G _(ZF)=(Θ^(H) ·{tilde over (H)} ^(H) ·{tilde over (H)}·Θ)⁻¹ ·Θ ^(H)·{tilde over (H)} ^(H,)  (17)

[0148] which completely eliminates the ISI, irrespective of the noiselevel. By ignoring the noise, it causes excessive noise enhancement,especially at low SNR. A second possiblity is to apply a MinimumMean-Square-Error (MMSE) linear equalizer: $\begin{matrix}{{G_{MMSE} = {\left( {{\Theta^{H} \cdot {\overset{\sim}{H}}^{H} \cdot \overset{\sim}{H} \cdot \Theta} + {\frac{\sigma_{w}^{2}}{\sigma_{s}^{2}}I_{B}}} \right)^{- 1} \cdot \Theta^{H} \cdot {\overset{\sim}{H}}^{H}}},} & (18)\end{matrix}$

[0149] which minimizes the MSE between the actual transmitted symbolblock and its estimate. The MMSE linear equalizer explicitly takes intoaccount the noise variance σ_(w) ² and the information symbol varianceσ_(s) ², and balances ISI elimination with noise enhancement. From (17)and (18), it is also clear that G_(MMSE) reduces to G_(ZF) at high SNR.

[0150] During the initialization phase, G_(ZF) and G_(MMSE) can becomputed from the multiple sets of linear equations, implicitly shown in(17) and (18), respectively. The solution can be found from Gaussianelimination with partial pivoting, based on the LU decomposition,leading to an overall complexity of

(QB²). During the data processing phase, the equalizers GZF and GMmSEare applied to the received block {tilde over (y)}^(m)[i], leading to acomplexity of

(QB).

A.2.c Joint Decision Feedback Equalization and Decoding

[0151] On the one hand, the ML algorithm of Subsection IX-A.2.a achievesthe optimal performance but with a very high complexity. On the otherhand, the linear equalizers of Subsection IX-A.2.b offer a lowcomplexity but at a relatively poor performance. The class of non-linearequalizers that perform joint decision feedback equalization anddecoding lie in between the former categories, both in terms ofperformance and complexity. Decision feedback equalizers exploit thefinite alphabet property of the information symbols to improveperformance relative to linear equalizers. They consist of a feedforwardsection, represented by the matrix W, and a feedback section,represented by the matrix B:

ŝ ^(m) [i]=slice [W·{tilde over (y)} ^(m)[i]−B−{circumflex over(^(m))}[i]].  (19)

[0152] The feedforward and feedback section can be designed according toa ZF or MMSE criterium. In either case, B should be a strictly upper orlower triangular matrix with zero diagonal entries, in order to feedbackdecisions in a causal way. To design the decision feedback counterpartof the ZF linear equalizer, we compute the Cholesky decomposition of thematrix Θ^(H·{tilde over (H)}) ^(H·{tilde over (H)}·Θin ()17):

Θ^(H) ·{tilde over (H)} ^(H) {tilde over (H)}·Θ=(Σ₁ ·U ₁)^(H·Σ) ₁ ·U₁,  (20)

[0153] where U₁ is an upper triangular matrix with ones along thediagonal, and Σ₁ is a diagonal matrix with real entries. The ZFfeedforward and feedback matrices then follow from:

W _(ZF) =U ₁ ·G _(ZF)=Σ₁ ⁻¹·(U ₁ ^(H)·Σ₁)⁻¹ ·Θ ^(H{tilde over (H)})^(H, B) _(ZF) =U ₁ −I _(B).  (21)

[0154] The linear feedforward section W_(ZF) suppresses the ISIoriginating from “future” symbols, the so-called pre-cursor ISI, whereasthe non-linear feedback section B_(ZF) eliminates the ISI originatingfrom “past” symbols, the so-called post-cursor ISI.

[0155] Likewise, to design the decision feedback counterpart of the MMSElinear equalizer, we compute the Cholesky decomposition of the matrix${\Theta^{H} \cdot {\overset{\sim}{H}}^{H} \cdot \overset{\sim}{H} \cdot \Theta} + {\frac{\sigma_{w}^{2}}{\sigma_{s}^{2}}I_{B}\quad {in}\quad (18)\text{:}}$

$\begin{matrix}{{{{\Theta^{H} \cdot {\overset{\sim}{H}}^{H} \cdot \overset{\sim}{H} \cdot \Theta} + {\frac{\sigma_{w}^{2}}{\sigma_{s}^{2}}I_{B}}} = {\left( {\Sigma_{2} \cdot U_{2}} \right)^{H} \cdot \Sigma_{2} \cdot U_{2}}},} & (22)\end{matrix}$

[0156] where U₂ is an upper triangular matrix with ones along thediagonal, and Σ₂ is a diagonal matrix with real entries. The MMSEfeedforward and feedback matrices can then be calculated as:

W_(MMSE) =U ₂ ·G _(MMSE)=Σ₂ ⁻¹·(U ₂ ^(H)·Σ₂)⁻¹·Θ^(H) ·{tilde over (H)}^(H) , B _(MMSE) =U ₂ −I _(B).  (23)

[0157] During the initialization phase, the feedforward and feedbackfilters are computed based on a Cholesky decomposition, leading to anoverall complexity of

(QB²). During the data processing phase, the feedforward and feedbackfilters are applied to the received data according to (19), leading to acomplexity of

(QB). Note that the decision feedback equalizers involve the same orderof complexity as their linear counterparts.

A.2.d Separate Linear Equalization and Decoding

[0158] Previously, we have only considered joint equalization anddecoding of the linear precoding. However, in order to even furtherreduce the complexity with respect to the linear equalizers ofSubsection IX-A.2.b, equalization and decoding can be performedseparately as well:

ŝ ^(m)[i]=Θ^(H·{tilde over (G)}·{tilde over (y)}) ^(m)[i],  (24)

[0159] where {tilde over (G)} performs linear equalization only andtries to restore {tilde over (s)}^(m)[i], and Θ^(H)subsequently performslinear decoding only and tries to restore s^(m)[i].

[0160] The ZF equalizer perfectly removes the amplitude and phasedistortion:

G _(ZF)=({tilde over (H)} ^(H·{tilde over (H)}))⁻¹ ·{tilde over (H)}^(H,)  (25)

[0161] but also causes excessive noise enhancement, especially on thosetones that experience a deep channel fade. Since {tilde over (H)} is adiagonal matrix, the ZF equalizer decouples into Q parallel single-tapequalizers, acting on a per-tone basis in the FD. The MMSE equalizerbalances amplitude and phase distortion with noise enhancement and canbe expressed as:

{tilde over (G)} _(MMSE)=({tilde over (H)} ^(H·{tilde over (H)}+σ) _(w)²R⁻¹ _({tilde over (s)}))⁻¹ ·{tilde over (H)} ^(H,)  (26)

[0162] where R_({tilde over (s)}):=E{{tilde over (s)}^(m)[i]·{tilde over(s)}^(m)[i]^(H}=σ) _(s) ²Θ·Θ^(H)is the covariance matrix of {tilde over(s)}^(m)[i]. If we neglect the color in the precoded symbolsR_({tilde over (s)})

σ_(s) ²I_(Q), the MMSE equalizer also decouples into Q parallel andindependent single-tap equalizers.

[0163] During the initialization phase, {tilde over (G)}_(ZF) and {tildeover (G)}_(MMSE) are calculated from (25) and (26), respectively, wherethe matrix inversion reduces to Q parallel scalar divisions, leading toan overall complexity of

(Q). During the data processing phase, the received data is separatelyequalized and decoded, leading to an overall complexity of

(QB).

A.3 Extension to Multiple Antennas

[0164] As showed in Sections IX-A.1 and IX-A.2, MCBS-CDMA successfullyaddresses the challenges of broadband cellular downlink communications.However, the spectral efficiency of single-antenna MCBS-CDMA is stilllimited by the received signal-to-noise ratio and can not be furtherimproved by traditional communication techniques. As opposed tosingle-antenna systems, Multiple-Input Multiple-Output (MIMO) systemsthat deploy N_(T) transmit and N_(R) receive antennas, enable anN_(min)-fold capacity increase in rich scattering environments, whereN_(min)=min{N_(T), NR} is called the multiplexing gain. Besides thetime, frequency and code dimensions, MIMO systems create an extraspatial dimension that allows to increase the spectral efficiency and/orto improve the performance. On the one hand, Space Division Multiplexing(SDM) techniques achieve high spectral efficiency by exploiting thespatial multiplexing gain. On the other hand, Space-Time Coding (STC)techniques achieve high Quality-of-Service (QoS) by exploiting diversityand coding gains. Besides the leverages they offer, MIMO systems alsosharpen the challenges of broadband cellular downlink communications.First, time dispersion and ISI are now caused by N_(T)N_(R)frequency-selective multi-path fading channels instead of just 1.Second, MUI originates from N_(T)M sources instead of just M. Third, thepresence of multiple antennas seriously impairs a low-complexityimplementation of the MS. To tackle these challenges, we willdemonstrate the synergy between our MCBS-CDMA waveform and MIMO signalprocessing. In particular, we focus on a space-time block codedMCBS-CDMA transmission, but the general principles apply equally well toa space-time trellis coded or a space division multiplexed MCBS-CDMAtransmission.

A.3.a Space-Time Block Coded MCBS-CDMA Transmission

[0165] The block diagram in FIG. 7 describes the Space-Time Block Coded(STBC) MCBS-CDMA downlink transmission scheme (where only the m-th useris explicitly shown), that transforms the M user data symbol sequences{s^(m)[i]}_(m=1) ^(M) into N_(T) ST coded multi-user chip sequences{u_(n) _(t) [n]}_(n) _(t) ₌₁ ^(N) ^(_(T)) with a rate 1/T_(c). Forconciseness, we limit ourselves to the case of N_(T)=2 transmitantennas. As for the single-antenna case, the information symbols arefirst grouped into blocks of B symbols and linearly precoded. Unlike thetraditional approach of performing ST encoding at the scalar symbollevel, we perform ST encoding at the symbol block level. Out ST encoderoperates in the FD and takes two consecutive symbol blocks {{tilde over(s)}^(m) [2i], {tilde over (s)}^(m)[2i+1]} to output the following 2Q×2matrix of ST coded symbol blocks: $\begin{matrix}{\begin{bmatrix}{{\overset{\_}{s}}_{1}^{m}\left\lbrack {2\quad i} \right\rbrack} & {{\overset{\_}{s}}_{1}^{m}\left\lbrack {{2\quad i} + 1} \right\rbrack} \\{{\overset{\_}{s}}_{2}^{m}\left\lbrack {2\quad i} \right\rbrack} & {{\overset{\_}{s}}_{2}^{m}\left\lbrack {{2\quad i} + 1} \right\rbrack}\end{bmatrix} = {\begin{bmatrix}{{\overset{\sim}{s}}^{m}\left\lbrack {2\quad i} \right\rbrack} & {- {{\overset{\sim}{s}}^{m}\left\lbrack {{2\quad i} + 1} \right\rbrack}^{*}} \\{{\overset{\sim}{s}}^{m}\left\lbrack {{2\quad i} + 1} \right\rbrack} & {{\overset{\sim}{s}}^{m}\left\lbrack {2\quad i} \right\rbrack}^{*}\end{bmatrix}.}} & (27)\end{matrix}$

[0166] At each time interval i, the ST coded symbol blocks {overscore(s)}₁ ^(m)[i] and {overscore (s)}₂ ^(m)[i] are forwarded to the firstand the second transmit antenna, respectively. From (27), we can easilyverify that the transmitted symbol block at time instant 2i+1 from oneantenna is the conjugate of the transmitted symbol block at time instant2i from the other antenna (with a possible sign change). Thiscorresponds to a per-tone implementation of the classical Alamoutischeme for frequency-flat fading channels. As we will show later, thisproperty allows for deterministic transmit stream separation at thereceiver.

[0167] After ST encoding, the resulting symbol block sequences{{overscore (s)}_(n) _(t) ^(m)[i]}_(n) _(t) ₌₁ ^(N) ^(_(T)) are blockspread and code division multiplexed with those of the other users:$\begin{matrix}{{{{\overset{\sim}{x}}_{n_{t}}\lbrack n\rbrack} = {\sum\limits_{m = 1}^{M}\quad {{{\overset{\_}{s}}_{n_{t}}^{m}\lbrack i\rbrack}{c^{m}\lbrack n\rbrack}}}},{n = {{i\quad N} + {n^{\prime}.}}}} & (28)\end{matrix}$

[0168] At this point, it is important to note that each of the N_(T)parallel block sequences are block spread by the same composite codesequence c^(m)[n], guaranteeing an efficient utilization of theavailable code space. As will become apparent later, this propertyallows for deterministic user separation at every receive antenna. AfterIFFT transformation and the addition of some form of transmitredundancy:

u _(n) _(t) [n]=T·F _(Q) ^(H) ·{tilde over (x)} _(n) _(t) [n],  (29)

[0169] the corresponding scalar sequences {u_(n) _(t) [n]}_(n) _(t) ₌₁^(N) ^(_(T)) are transmitted over the air at a rate 1/T _(c).

A.3.b MUI-Resilient MIMO Reception

[0170] The block diagram in FIG. 8 describes the reception scheme forthe MS of interest, which transforms the different received sequences{v_(n) _(r) [n]}_(n) _(r) ₌₁ ^(N) ^(_(R)) into an estimate of thedesired user's data sequence ŝ^(m)[i]. After transmit redundancy removaland FFT transformation, we obtain the multi-antenna counterpart of (11):$\begin{matrix}{{{{\overset{\sim}{Y}}_{n_{r}}\lbrack i\rbrack} = {{\sum\limits_{n_{t} = 1}^{N_{T}}\quad {{\overset{\sim}{H}}_{n_{r},n_{t}} \cdot {{\overset{\sim}{X}}_{n_{t}}\lbrack i\rbrack}}} + {{\overset{\sim}{Z}}_{n_{r}}\lbrack i\rbrack}}},} & (30)\end{matrix}$

[0171] where {tilde over (Y)}_(n) _(r) [i]:=[{tilde over (y)}_(n) _(r)[iN], . . . , {tilde over (y)}_(n) _(r) [(i+1)N−1]] stacks N consecutivereceived chip blocks {tilde over (y)}_(n) _(r) [n] at the n_(r)-threceive antenna, {tilde over (H)}_(n) _(r) _(,n) _(t) is the diagonal FDchannel matrix from the n_(t)-th transmit to the n_(r)-th receiveantenna, and {tilde over (X)}_(n) _(t) [i] and {tilde over (Z)}_(n) _(r)[i] are similarly defined as {tilde over (Y)}_(n) _(r) [i]. From (28)and (30), we can conclude that our transceiver retains the userorthogonality at each receive antenna, irrespective of the underlyingfrequency-selective multi-path channels. Like in the single-antennacase, a low-complexity block despreading operation with the desireduser's composite code vector c^(m)[i] deterministically removes the MUIat each receive antenna: $\begin{matrix}{{{{{\overset{\_}{y}}_{n_{r}}^{m}\lbrack i\rbrack}\text{:}} = {{{{\overset{\sim}{Y}}_{n_{r}}\lbrack i\rbrack} \cdot {c^{m}\lbrack i\rbrack}^{*}} = {{\sum\limits_{n_{t} = 1}^{N_{T}}\quad {{\overset{\sim}{H}}_{n_{r},n_{t}} \cdot {{\overset{\sim}{s}}_{n_{t}}^{m}\lbrack i\rbrack}}} + {{\overset{\sim}{z}}_{n_{r}}^{m}\lbrack i\rbrack}}}},} & (31)\end{matrix}$

[0172] Hence, our transceiver successfully converts (through blockdespreading) a multi-user MIMO detection problem into an equivalentsingle-user MIMO equalization problem.

A.3.c Single-User Space-Time Decoding

[0173] After MUI elimination, the information blocks s^(m)[I] still needto be decoded from the received block despread sequences {{overscore(y)}_(n) _(r) ^(m)[i]}_(n) _(r) ₌₁ ^(N) ^(_(R)) Our ST decoderdecomposes into three steps: an initial ST decoding step and a transmitstream separation step for each receive antenna, and, finally, a receiveantenna combining step.

[0174] The initial ST decoding step considers two consecutive symbolblocks {{overscore (y)}_(n) _(r) ^(m)[2i] and {overscore (y)}_(n) _(r)^(m)[2i+1]}, both satisfying the block input/output relationship of (31). By exploiting the ST code structure of (27), we arrive at:$\begin{matrix}{{{{\overset{\_}{y}}_{n_{r}}^{m}\left\lbrack {2i} \right\rbrack} = {{{\overset{\sim}{H}}_{n_{r},1} \cdot {{\overset{\_}{s}}_{1}^{m}\left\lbrack {2i} \right\rbrack}} + {{\overset{\sim}{H}}_{n_{r},2} \cdot {{\overset{\_}{s}}_{2}^{m}\left\lbrack {2i} \right\rbrack}} + {{\overset{\_}{z}}_{n_{r}}^{m}\left\lbrack {2i} \right\rbrack}}},} & (32) \\{{{\overset{\_}{y}}_{n_{r}}^{m}\left\lbrack {{2i} + 1} \right\rbrack}^{*} = {{{- {\overset{\sim}{H}}_{n_{r},1}^{*}} \cdot {{\overset{\_}{s}}_{2}^{m}\left\lbrack {2i} \right\rbrack}} + {{\overset{\sim}{H}}_{n_{r},2}^{*} \cdot {{\overset{\_}{s}}_{1}^{m}\left\lbrack {2i} \right\rbrack}} + {{{\overset{\_}{z}}_{n_{r}}^{m}\left\lbrack {{2i} + 1} \right\rbrack}^{*}.}}} & (33)\end{matrix}$

[0175] Combining (32) and (33) into a single block matrix form, weobtain: $\begin{matrix}{{\underset{\underset{{\overset{\_}{r}}_{n_{r}}^{m}{\lbrack i\rbrack}}{}}{\begin{bmatrix}{{\overset{\_}{y}}_{n_{r}}^{m}\left\lbrack {2i} \right\rbrack} \\{{\overset{\_}{y}}_{n_{r}}^{m}\left\lbrack {{2i} + 1} \right\rbrack}^{*}\end{bmatrix}} = {{\underset{\underset{{\overset{\_}{H}}_{n_{r}}}{}}{\begin{bmatrix}{\overset{\sim}{H}}_{n_{r},1} & {\overset{\sim}{H}}_{n_{r},2} \\{\overset{\sim}{H}}_{n_{r},2}^{*} & {- {\overset{\sim}{H}}_{n_{r},1}^{*}}\end{bmatrix}} \cdot \begin{bmatrix}{{\overset{\sim}{s}}^{m}\left\lbrack {2i} \right\rbrack} \\{{\overset{\sim}{s}}^{m}\left\lbrack {{2i} + 1} \right\rbrack}\end{bmatrix}} + \underset{\underset{{\overset{\_}{\eta}}_{n_{r}}^{m}{\lbrack i\rbrack}}{}}{\begin{bmatrix}{{\overset{\_}{z}}_{n_{r}}^{m}\left\lbrack {2i} \right\rbrack} \\{{\overset{\_}{z}}_{n_{r}}^{m}\left\lbrack {{2i} + 1} \right\rbrack}^{*}\end{bmatrix}}}},} & (34)\end{matrix}$

[0176] where {overscore (s₁[2i]={tilde over (s)})}^(m)[2i] and{overscore (s)}₂ ^(m)[2i]={tilde over (s)}^(m)[2i+1] follow from (27).From the structure of {overscore (H)}_(n) _(r) in (34), we can deducethat our transceiver retains the orthogonality among transmit streams ateach receive antenna for each tone separately, regardless of theunderlying frequency-selective multi-path channels. A similar propertywas also encountered in the classical Alamouti scheme, but only forsingle-user frequency-flat fading multi-path channels.

[0177] The transmit stream separation step relies on this property todeterministically remove the transmit stream interference throughlow-complexity linear processing. Let us define the Q×Q matrix {tildeover (D)}_(n) _(r) with non-negative diagonal entries as: {tilde over(D_(n) _(r) :=[{tilde over (H)})}_(n) _(r) _(,1)·{tilde over (H)}_(n)_(r) _(,1)*+{tilde over (H)}_(n) _(r) _(,2)·{tilde over (H_(n) _(r)_(,2)*]^(1/2). From (34), we can verify that the channel matrix{overscore (H)})}_(n) _(r) satisfies: {overscore (H)}_(n) _(r)^(H)·{overscore (H)}_(n) _(r) =I₂X{tilde over (D)}_(n) _(r) ², where{circle over (X)} stands for Kronecker product. Based on {overscore(H)}_(n) _(r) and {tilde over (D)}_(n) _(r) , we can construct a unitarymatrix {overscore (U)}_(n) _(r) :={overscore (H)}_(n) _(r) ·(I₂{circleover (X)}{tilde over (D)}_(n) _(r) ⁻¹), which satisfies {overscore(U)}_(n) _(r) ^(H)·{overscore (U_(n) _(r) =I_(2Q) and U)}_(n) _(r)^(H)·{overscore (H)}_(n) _(r) =I₂{circle over (X)}{tilde over (D)}_(n)_(r) . Performing unitary combining on (34) (through {overscore (U)}_(n)_(r) ^(H)), collects the transmit antenna diversity at the n_(r)-threceive antenna: $\begin{matrix}{{\underset{\underset{{\overset{'}{r}}_{n_{r}}^{m}{\lbrack i\rbrack}}{}}{\begin{bmatrix}{{\overset{'}{y}}_{n_{r}}^{m}\left\lbrack {2i} \right\rbrack} \\{{\overset{'}{y}}_{n_{r}}^{m}\left\lbrack {{2i} + 1} \right\rbrack}\end{bmatrix}}:={{{\overset{\_}{U}}_{n_{r}}^{H} \cdot {{\overset{\_}{r}}_{n_{r}}^{m}\lbrack i\rbrack}} = {\begin{bmatrix}{{\overset{\sim}{D}}_{n_{r}} \cdot {{\overset{\sim}{s}}^{m}\left\lbrack {2i} \right\rbrack}} \\{{\overset{\sim}{D}}_{n_{r}} \cdot {{\overset{\sim}{s}}^{m}\left\lbrack {{2i} + 1} \right\rbrack}}\end{bmatrix} + \underset{\underset{{\overset{'}{\eta}}_{n_{r}}^{m}{\lbrack i\rbrack}}{}}{\begin{bmatrix}{{\overset{'}{z}}_{n_{r}}^{m}\left\lbrack {2i} \right\rbrack} \\{{\overset{'}{z}}_{n_{r}}^{m}\left\lbrack {{2i} + 1} \right\rbrack}\end{bmatrix}}}}},} & (35)\end{matrix}$

[0178] where the resulting noise {acute over (η)}_(n) _(r)^(m)[i]:={overscore (U)}_(n) _(r) ^(H)·{overscore (η)}_(n) _(r) ^(m)[i]is still white with variance σ_(w) ². Since multiplying with a unitarymatrix preserves ML optimality, we can deduce from (35) that the symbolblocks {tilde over (s)}^(m)[2i] and {tilde over (s)}^(m)[2i+1] can bedecoded separately in an optimal way. As a result, the different symbolblocks {tilde over (s)}^(m)[i] can be detected independently from:

ý _(n) _(r) ^(m) [i]={tilde over (D)} _(n) _(r) ·{tilde over (s)} ^(m)[i]+ź _(n) _(r) ^(m) [i].  (36)

[0179] Stacking the blocks from the different receive antennas {ý_(n)_(r) ^(m)[i]}_(n) _(r) ₌₁ ^(N) ^(_(R)) for the final receive antennacombining step, we obtain: $\begin{matrix}{\underset{{\overset{'}{y}}^{m}{\lbrack i\rbrack}}{\underset{}{\begin{bmatrix}{{\overset{'}{y}}_{1}^{m}\lbrack i\rbrack} \\\vdots \\{{\overset{'}{y}}_{N_{R}}^{m}\lbrack i\rbrack}\end{bmatrix}}} = {{\underset{\overset{'}{H}}{\underset{}{\begin{bmatrix}{\overset{\sim}{D}}_{1} \\\vdots \\{\overset{\sim}{D}}_{N_{R}}\end{bmatrix}}} \cdot {{\overset{\sim}{s}}^{m}\lbrack i\rbrack}} + \underset{{\overset{'}{z}}^{m}{\lbrack i\rbrack}}{\underset{}{\begin{bmatrix}{{\overset{'}{z}}_{1}^{m}\lbrack i\rbrack} \\\vdots \\{{\overset{'}{z}}_{N_{R}}^{m}\lbrack i\rbrack}\end{bmatrix}}}}} & (37)\end{matrix}$

[0180] At this point, we have only collected the transmit antennadiversity at each receive antenna, but still need to collect the receiveantenna diversity. Let us define the Q×Q matrix {tilde over (D)} withnon-negative diagonal entries as:$\overset{\sim}{D}:={\left\lbrack {\sum\limits_{n_{t} = 1}^{N_{T}}{\sum\limits_{n_{r} = 1}^{N_{R}}{{\overset{\sim}{H}}_{n_{r},n_{t}} \cdot {\overset{\sim}{H}}_{n_{r},n_{t}}^{*}}}} \right\rbrack^{1/2}.}$

[0181] From (37), we can verify that: {acute over (H)}^(H)·{acute over(H)}={tilde over (D)}². Based on {acute over (H)} and {tilde over (D)},we can construct a tall unitary matrix Ú:={acute over (H)}·{tilde over(D)}⁻¹, which satisfies Ú^(H)·Ú=I_(Q) and Ú^(H)·{acute over (H)}={tildeover (D)}. Gathering the receive antenna diversity through multiplying(37) with Ú^(H), we finally obtain:

{tilde over (Y)} ^(m) [i]:=Ú ^(H) ·ý ^(m) [i]={tilde over (D)}·Θ·s ^(m)[i]+{tilde over (z)} ^(m) [i],  (38)

[0182] where the resulting noise {tilde over (z)}^(m)[i]:=Ú^(H)·ź^(m)[i]is still white with variance σ_(w) ². Since the multiplication with atall unitary matrix that does not remove information also preserves MLdecoding optimality, the blocks s^(m)[i] can be optimally decoded from(38). Moreover, (38) has the same structure as its single-antennacounterpart in (13). Hence, the design of the linear precoder Θ inSubsection IX-A.1.d, and the different equalization options that we havediscussed in Section IX-A.2, can be applied here as well.

A.4 Simulation Results

[0183] We consider the downlink of a single-antenna MCBS-CDMA system,operating at a carrier frequency of F_(c)=2 GHz, and transmitting with achip rate of R_(c) =1/T _(c)=4.096 MHz. Each user's bit sequence is QPSKmodulated with n_(b)=2 bits per symbol. We assume that the multi-pathchannel is FIR with, unless otherwise stated, order L_(c)=3, andRayleigh distributed channel taps of equal variance 1/L _(c)+1. Tosatisfy the IBI removal condition L≧L_(c), we choose L=8. Note that thisspecific design can handle a delay spread of T_(g)=LT_(c)

2 μs. However, a larger transmit redundancy can be used to handle moreISI. To limit the overhead, we choose the number of subearriers Q=8L=64,leading to a transmitted block length K=Q+L=72. Hence, the informationsymbols are parsed into blocks of B=Q−L=56 symbols, and linearlyprecoded into blocks of size Q=64. The Q×B preceding matrix Θconstitutes the first B columns of the DCT matrix. The precoded symbolblocks are subsequently block spread by a real orthogonal Walsh-Hadamardspreading code of length N=16, along with a complex random scramblingcode.

A.4.a Comparison of Different Equalization Options

[0184] We test the different equalization options, discussed in SectionIX-A.2, for a fully-loaded system with M=16 active users.

[0185]FIG. 9 compares the performance of the different Linear Equalizers(LEs) and Decision Feedback Equalizers (DFEs) that performtjointequalization and decoding. As a reference, also the performance of asystem without linear preceding (uncoded) as well as the optimal MLperformance are shown. Clearly, the system without linear preceding onlyachieves diversity 1, whereas ML detection achieves the fullfrequency-diversity gain L_(c)+1=4. The ZF-LE performs worse than theuncoded system at low SNR, but better at high SNR (SNR≧12 dB). TheMMSE-LE always outperforms the uncoded system and achieves a diversitygain between 1 and L_(c)+1=4. At a BER of 10⁻³, it realizes a 5.5 dBgain compared to its ZF counterpart. The non-linear ZF- and MMSE-DFEsoutperform their respective linear counterparts, although this effect ismore pronounced for the ZF than for the MMSE criterion. At a BER of10⁻³, the MMSE-DFE exhibits a 2.8 dB gain relative to the MMSE-LE, andcomes within 1.4 dB of the optimal ML detector.

[0186]FIG. 10 compares the performance of separate versus joint linearequalization and decoding. On the one hand, the separate ZF-LE alwaysperforms worse than the uncoded system, due to the excessive noiseenhancement caused by the presence of channel nulls. On the other hand,the separate MMSE-LE almost perfectly coincides with its correspondingjoint MMSE-LE, and thus achieves a diversity gain between 1 andL_(c)+1=4.

A.4.b Comparison With DS-CDMA

[0187] In the following, we compare two different CDMA transceivers:

[0188] T1. The first transceiver applies the classical downlink DS-CDMAtransmission scheme of the UMTS and the IS-2000 WCDMA standards. At thereceiver, a time-domain MMSE chip equalizer based on perfect ChannelState Information (CSI) is applied. The bandwidth efficiency of thefirst transceiver supporting M₁ users can be calculated as ε₁M₁/N, whereN is the length of the Walsh-Hadamard spreading codes.

[0189] T2. The second transceiver is our MCBS-CDMA transceiver,discussed in Section IX-A.1. At the receiver, a frequency-domain MMSEequalizer (either jointly or separately) based on perfect CSI is used.The bandwidth efficiency of our transceiver supporting M₂ users can becalculated as${\varepsilon_{2} = \frac{M_{2}B}{N\left( {B + {2L}} \right)}},$

[0190] where the overhead 2L stems from the redundant linear precedingand the IBI removal. In order to make a fair comparison between the twotransceivers, we should force their respective bandwidth efficiencies tobe the same ε₁=ε₂, which leads to the following relationship between thenumber of users to be supported by the different transceivers:$M_{2} = {\frac{B + {2L}}{B}{M_{1}.}}$

[0191] With B=56 and L=8, we can derive that M₂={fraction (9/7)}M₁.

[0192]FIG. 11 compares the performance of the two transceivers for asmall system load with M₁=3 and M₂=4 (ε₁

ε₂). Also shown in the figure is the optimal ML performance bound. Atlow SNR (SNR≦12), T1 has a 1 dB advantage compared to T2. However, athigh SNR (SNR≧12), the performance of T1 starts already flooring off,due to ISI/ICI and associated MUI. Hence, T2 outperforms T1 at high SNR.

[0193]FIG. 12 depicts the same curves but now for a large system loadwith M₁=12 and M₂=16 (ε₁

ε₂). Since T2 is an MUI-free CDMA transceiver, its performance remainsunaffected by the MUI. So, even at large system load, T2 achieves adiversity order between 1 and L_(c)+1=4. We also observe that T1 nowperforms poorly compared to T2: e.g. at a BER of 3·10⁻², T2 achieves a 9dB gain compared to T1. In contrast with T2 that deterministicallyremoves the MUI, T1 does not completely suppress these interferences athigh SNR. Hence, T1 suffers from a BER saturation level that increaseswith the system load M₁.

A.4.c Performance of Space-Time Block Coded MCBS-CDMA

[0194] We test our MIMO CDMA transceiver of Section IX-A.3, employing acascade of STBC and MCBS-CDMA, for three different MIMO system setups(N_(T), N_(R)): the (1,1) setup, the (2,1) setup with TX diversity onlyand the (2,2) setup with both TX and RX diversity. The system isfully-loaded supporting M=16 active users. For each setup, both theMMSE-LE as well as the optimal ML detector are shown.

[0195]FIG. 13 depicts the results for frequency-selective channels withchannel order L_(c)=1. Fixing the BER at 10⁻³ and focusing on theMMSE-LE, the (2,1) setup outperforms the (1,1) setup by 6 dB. The (2,2)setup achieves on its turn a 3.5 dB gain compared to the (2,1) setup.Comparing the MMSE-LE with its corresponding ML detector, it incurs a 4dB loss for the (1,1) setup, but only a 0.4 dB loss for the (2, 2)setup. So, the larger the number of TX and/or RX antennas, the betterthe proposed transceiver with linear receiver processing succeeds inextracting the full diversity of order N_(T)N_(R)(L_(c)+1).

[0196]FIG. 14 shows the same results but now for frequency-selectivechannels with channel order L_(c)=3. Again fixing the BER at 10³¹ ³ andfocusing on the MMSE-LE, the (2,1) setup outperforms the (1,1) setup by4 dB, whereas the (2,2) setup achieves on its turn a 2 dB gain comparedto the (2,1) setup. So, compared to FIG. 13, the corresponding gains arenow smaller because of the inherently larger underlying multi-pathdiversity.

A.5 Conclusion

[0197] To cope with the challenges of broadband cellular downlinkcommunications, we have designed a novel Multi-Carrier (MC) CDMAtransceiver that enables significant performance improvements comparedto 3G cellular systems, yielding gains of up to 9 dB in full loadsituations. To this end, our so-called Multi-Carrier Block-Spread (MCBS)CDMA transceiver capitalizes on redundant block-spreading and linearprecpding to preserve the orthogonality among users and to enable fullmulti-path diversity gains, regardless of the underlying multi-pathchannels. Different equalization options, ranging from linear to MLdetection, strike the trade-off between performance and complexity.Specifically, the MMSE decision feedback equalizer realizes a 2.8 dBgain relative to its linear counterpart and performs within 1.4 dB ofthe optimal ML detector. Finally, our transceiver demonstrates arewarding synergy with multi-antenna techniques to increase the spectralefficiency and/or improve the link reliability over MIMO channels.Specifically, our STBC/MCBS-CDMA transceiver retains the orthogonalityamong users as well as transmit streams to realize both multi-antennaand multi-path diversity gains of N_(T)N_(R)(L_(c)+1) for every user inthe system, irrespective of the system load. Moreover, a low-complexitylinear MMSE detector, that performs either joint or separateequalization and decoding, approaches the optimal ML performance (within0.4 dB for a (2, 2) system) and comes close to extracting the fulldiversity in reduced as well as full load settings.

B. EMBODIMENT B.1 MC-DS-CDMA Downlink System Model B.1.a TransmitterModel

[0198] Let us consider the downlink of a single-cell space-time codedMC-DS-CDMA system with U active mobile stations. As depicted in FIG. 15,at the base-station, which we suppose to have M_(t) transmit antennas, aspace-time coded MC-DS-CDMA transmission scheme transforms the differentuser symbol sequences {s^(u)[i]}_(u=1) ^(U) and the pilot symbolsequence s^(p)[i] into M_(t) time-domain space-time coded multi-userchip sequences {u_(m) _(t) [n]}_(m) ⁻¹ ^(m) ^(_(t)) , where u_(m) _(t)[n] is transmitted from the m_(t)-th transmit antenna. For simplicityreasons, we will assume in the following that the base-station has onlyM_(t)=2 transmit antennas. Note however that the proposed techniques canbe extended to the more general case of M_(t)>2 transmit antennas whenresorting to the generalized orthogonal designs. As shown in FIG. 15,each user's data symbol sequence s^(u)[i] (similar for the pilot symbolsequence s^(p)[i]) is serial-to-parallel converted into blocks of Bsymbols, leading to the symbol block sequences^(u)[i]: = [s^(u)[i  B]  …  s^(u)[(i + 1)B − 1]]^(T).

[0199] The symbol block sequence s^(u)[i] is linearly precoded by a Q×Bmatrix Θ, with Q the number of tones, to yield the precoded symbol blocksequence {tilde over (s)}^(u)[i]:=Θ·{tilde over (s)}^(u)[i]. Theprecoded symbol block sequence {tilde over (s)}^(u)[i] is demultiplexedinto M_(t)parallel sequences {{tilde over (s)}_(m) _(t) ^(u)[i]:={tildeover (s)}^(u)[iM_(t)+m_(t)−1]}_(m) _(t) ₌₁ ^(M) ^(_(t)) , where M_(t)isthe number of transmit antennas. Each of the u-th user's precoded symbolblock sequences {{tilde over (s)}_(m) _(t) ^(u)[i]_(m) _(t) ₌₁ ^(M)^(_(t)) is spread by a factor N with the same user code sequencec_(u)[n] which is the multiplication of the user specific orthogonalWalsh-Hadamard spreading code and the base-station specific scramblingcode. For each of the M_(t) parallel streams, the different user chipblock sequences are added up together with the pilot chip blocksequence, resulting into the m_(t)-th multi-user chip block sequence:$\begin{matrix}{{{\overset{\sim}{x}}_{m_{t}}\lbrack n\rbrack} = {{\sum\limits_{u = 1}^{U}{{{\overset{\sim}{s}}_{m_{t}}^{u}\lbrack i\rbrack}{c_{u}\lbrack n\rbrack}}} + {{{\overset{\sim}{s}}_{m_{t}}^{p}\lbrack i\rbrack}{c_{p}\lbrack n\rbrack}}}} & (39)\end{matrix}$

[0200] with i=└n/N┘. The Space-Time (ST) encoder operates in thefrequency-domain and takes the two multi-user chip blocks {{tilde over(x)}_(m) _(t) [n]_(m) _(t) ₌₁ ² to output the following 2Q×2 matrix ofST coded multi-user chip blocks: $\begin{matrix}{\begin{bmatrix}{{\overset{\sim}{x}}_{1}\left\lbrack {2n} \right\rbrack} & {{\overset{\sim}{x}}_{1}\left\lbrack {{2n} + 1} \right\rbrack} \\{{\overset{\sim}{x}}_{2}\left\lbrack {2n} \right\rbrack} & {{\overset{\sim}{x}}_{2}\left\lbrack {{2n} + 1} \right\rbrack}\end{bmatrix} = \begin{bmatrix}{{\overset{\sim}{x}}_{1}\lbrack n\rbrack} & {- {{\overset{\sim}{x}}_{2}^{*}\lbrack n\rbrack}} \\{{\overset{\sim}{x}}_{2}\lbrack n\rbrack} & {{\overset{\sim}{x}}_{1}^{*}\lbrack n\rbrack}\end{bmatrix}} & (40)\end{matrix}$

[0201] At each time interval n, the ST coded multi-user chip blocks{overscore (x)}₁[n] and {overscore (x)}₂[n] are forwarded to the firstrespectively the second transmit antenna. From Equation 40, we caneasily verify that the transmitted multi-user chip block at time instant2n+1 from one antenna is the conjugate of the transmitted multi-userchip block at time instant 2n from the other antenna. The Q×Q IFFTmatrix F_(Q) ^(H) transforms the frequency-domain ST coded multi-userchip block sequence {overscore (x)}_(m) _(t) [n] into the time-domain STcoded multi-user chip block sequence {overscore (x)}_(m) _(t) [n]=F_(Q)^(H)·{overscore (x)}_(m) _(t) [n]. The K×Q transmit matrix T, withK=Q+μ, adds a cyclic prefix of length μ to each block of the time-domainST coded multi-user chip block sequence x_(m) _(t) [n] leading to thetime-domain transmitted multi-user chip block sequence U_(m) _(t)[n]=T·X_(m) _(t) [n]. Finally, the time-domain transmitted multi-userchip block sequence u_(m) _(t) [n] is parallel-to-serial converted intoK chips, obtaining the time-domain transmitted multi-user chip sequence[u_(m_(t))[n  K]  …  u_(m_(t))[(n + 1)K − 1]]^(T): = u_(m_(t))[n].

B.1 .b Receiver Model

[0202] We assume that the mobile station of interest is equipped withM_(r) receive antennas and has acquired perfect synchronisation. Asshown in FIG. 16, at each receive antenna, the time-domain received chipsequence v_(m) _(r) [n] is serial-to-parallel converted into blocks of Kchips, resulting into the time-domain received chip block sequencev_(m_(r))[n]  : = [v_(m_(r))[n  K]  …  v_(m_(r))[(n + 1)K − 1]]^(T).

[0203] The Q×K receive matrix R discards the cyclic prefix of each blockof the time-domain received chip block sequence v_(m) _(r) [n] leadingto the time-domain received ST coded chip block sequence y_(m) _(r)[n]=R·v_(m) _(r) [n]. By transforming the time-domain received ST codedchip block sequence y_(m) _(r) [n] into the frequency-domain {overscore(y)}_(m) _(r) [n]:=F_(Q)·y_(m) _(r) [n] with the Q×Q FFT matrix F_(Q),assuming a sufficiently long cyclic prefix μ>L, we obtain a simpleinput/ouput relationship in the frequency-domain: $\begin{matrix}{{{\overset{\_}{y}}_{m_{r}}\lbrack n\rbrack} = {{\sum\limits_{m_{t} = 1}^{M_{t}}{{\overset{\sim}{H}}_{m_{r},m_{t}} \cdot {{\overset{\_}{x}}_{m_{t}}\lbrack n\rbrack}}} + {{\overset{\_}{e}}_{m_{r}}\lbrack n\rbrack}}} & (41)\end{matrix}$

[0204] where {overscore (e)}_(m) _(r) [n] is the frequency-domainreceived noise block sequence and {tilde over (H)}_(m) _(r) _(,m) _(t)the Q×Q diagonal frequency-domain channel matrix having thefrequency-domain channel response {overscore (h)}_(m) _(r) _(,m) _(t) asits main diagonal. Exploiting the structure of the ST code design inEquation 40, we can write for two consecutive chip blocks {overscore(y)}_(m) _(r) [2n] and {overscore (y)}_(m) _(r) *[2n+1] thefrequency-domain input/ouput relationship of Equation 41, resulting inEquation 42. Stacking the contributions of the M_(r) receive antennas$\begin{matrix}{\underset{\underset{{\overset{\sim}{y}}_{m_{r}}{\lbrack n\rbrack}}{}}{\begin{bmatrix}{{\overset{\_}{y}}_{m_{r}}\left\lbrack {2n} \right\rbrack} \\{{\overset{\_}{y}}_{m_{r}}^{*}\left\lbrack {{2n} + 1} \right\rbrack}\end{bmatrix}} = {{\underset{\underset{{\overset{\sim}{H}}_{m_{r}}}{}}{\begin{bmatrix}{\overset{\sim}{H}}_{m_{r},1} & {\overset{\sim}{H}}_{m_{r},2} \\{\overset{\sim}{H}}_{m_{r},2}^{*} & {- {\overset{\sim}{H}}_{m_{r},1}^{*}}\end{bmatrix}} \cdot \underset{\overset{\sim}{x}{\lbrack n\rbrack}}{\underset{}{\begin{bmatrix}{{\overset{\sim}{x}}_{1}\lbrack n\rbrack} \\{{\overset{\sim}{x}}_{2}\lbrack n\rbrack}\end{bmatrix}}}} + \underset{\underset{{\overset{\sim}{e}}_{m_{r}}{\lbrack n\rbrack}}{}}{\begin{bmatrix}{{\overset{\_}{e}}_{m_{r}}\left\lbrack {2n} \right\rbrack} \\{{\overset{\_}{e}}_{m_{r}}^{*}\left\lbrack {{2n} + 1} \right\rbrack}\end{bmatrix}}}} & (42)\end{matrix}$

${{\overset{\sim}{y}\lbrack n\rbrack} = \left\lbrack {{{\overset{\sim}{y}}_{1}^{T}\lbrack n\rbrack}\quad \ldots \quad {{\overset{\sim}{y}}_{M_{r}}^{T}\lbrack n\rbrack}} \right\rbrack^{T}},$

[0205] we obtain the following per receive antenna frequency-domain datamodel:

{tilde over (y)}[n]={tilde over (H)}·{tilde over (x)}[n]+{tilde over(e)}[n]  (43)

[0206] where the per receive antenna channel matrix {tilde over (H)} andthe per receive antenna noise block {tilde over (e)}[n] are similarlydefined as the per receive antenna output block {tilde over (y)}[n].Defining the receive permutation matrix P_(r) respectively the transmitpermutation matrix P_(t)as follows:

ý[n]:=P _(r) ·{tilde over (y)}[n] {tilde over (x)}[n]:=P _(t) ·{acuteover (x)}[n]  (44)

[0207] where P_(r) permutes a per receive antenna ordering into aper-tone ordering and where P_(t)conversely permutes a per-tone orderinginto a per transmit antenna ordering, we obtain the following per-tonedata model:

ý[n]={acute over (H)}·{acute over (x)}[n]+é[n]  (45)

[0208] In this Equation,${{\overset{'}{y}\lbrack n\rbrack} = \left\lbrack {{{\overset{'}{y}}_{1}^{T}\lbrack n\rbrack}\quad \ldots \quad {{\overset{'}{y}}_{Q}^{T}\lbrack n\rbrack}}\quad \right\rbrack^{T}}\quad$

[0209] is the per-tone output block, {acute over (x)}[n] the per-toneinput block and é[n] the per-tone noise block similarly defined as ý[n].The per-tone channel matrix {acute over (H)} is a block diagonal matrix,given by: $\begin{matrix}{\overset{'}{H}:={{P_{r} \cdot \overset{\sim}{H} \cdot P_{t}} = \begin{bmatrix}{\overset{'}{H}}_{1} & \quad & \quad \\\quad & ⋰ & \quad \\\quad & \quad & {\overset{'}{H}}_{Q}\end{bmatrix}}} & (46)\end{matrix}$

B.1 .c Data Model for Burst Processing

[0210] Assuming a burst length of M_(t)·B·I symbols for each user, wecan stack I·N consecutive chip blocks {tilde over (y)}[n], defined inEquation 43, into${{\overset{\sim}{Y}\text{:}} = \left\lbrack {{\overset{\sim}{y}\lbrack 0\rbrack}\quad \ldots \quad {\overset{\sim}{y}\left\lbrack {{I\quad N} - 1} \right\rbrack}} \right\rbrack},$

[0211] leading to the following per receive antenna data model for burstprocessing:

{tilde over (Y)}={tilde over (H)}·{tilde over (X)}+{tilde over(E)}  (47)

[0212] where the input matrix {tilde over (X)} and the noise matrix{tilde over (E)} are similarly defined as the output matrix {tilde over(Y)}. By having a look at the definition of {tilde over (x)}[n] inEquation 42 and by inspecting Equation 39, we can write {tilde over (X)}as follows:

X={tilde over (S)} _(d) ·C _(d) +{tilde over (S)}·C _(p)  (48)

[0213] where the multi-user total data symbol matrix {tilde over(S)}_(d):=[{tilde over (S)}₁. . . {tilde over (S)}_(U)] stacks the totaldata symbol matrices of the different active users and the u-th user'stotal data symbol matrix {tilde over (S)}_(u):=[{tilde over (S)}₁ ^(uT){tilde over (S)}₂ ^(uT)]^(T) stacks the u-th user's data symbol matricesfor the different transmit antennas. The u-th user's data symbol matrixfor the m_(t)-th transmit antenna${{\overset{\sim}{S}}_{m_{t}}^{u}\text{:}} = \left\lbrack {{{\overset{\sim}{s}}_{m_{t}}^{u}\lbrack 0\rbrack}\quad \ldots \quad {{\overset{\sim}{s}}_{m_{t}}^{u}\left\lbrack {I - 1} \right\rbrack}} \right\rbrack$

[0214] stacks I consecutive precoded symbol blocks for the u-th user andthe m_(t)-th transmit antenna. The total pilot symbol matrix {tilde over(S)}_(p) and the pilot symbol matrix for the m_(t)-th transmit antenna{tilde over (S)}_(m) _(t) ^(p) are similarly defined as {tilde over(S)}_(u) respectively {tilde over (S)}_(m) _(t) ^(u). The multi-usercode matrix C_(d):=[C₁ ^(T). . . C_(U) ^(T)]^(T) stacks the codematrices of the different active users. The u-th user's code matrixstacks the u-th user's code vectors at I consecutive symbol instants:$\begin{matrix}{C_{u}:=\begin{bmatrix}{c_{u}\lbrack 0\rbrack} & \quad & \quad \\\quad & ⋰ & \quad \\\quad & \quad & {c_{u}\left\lbrack {I - 1} \right\rbrack}\end{bmatrix}} & (49)\end{matrix}$

[0215] where c_(u)[i] = [c_(u)[i  N]  …  c_(u)[(i + 1)N − 1]]

[0216] is the u-th user's code vector used to spread the precoded symbolblocks {{tilde over (s)}_(m) _(t) ^(u)[i]_(m) _(t) ₌₁ ^(M) ^(_(t)) .

[0217] Similarly to the per receive antenna data model for burstprocessing in Equation 47, we can stack I·N consecutive chip blocks ý[n]leading to the following per-tone data model for burst processing:

Ý={acute over (H)}·{acute over (X)}+É  (50)

[0218] Using Equation 44 and 48 we can express {acute over (X)} asfollows:

{acute over (X)}=Ś _(d) ·C _(d) +Ś _(p) ·C _(p)  (51)

[0219] where Ś_(d):=P_(t) ^(T)·{tilde over (S)}_(d) and Ś_(p):=P_(t)^(T)·{tilde over (S)}_(p) are the per-tone permuted versions of {tildeover (S)}_(d) respectively {tilde over (S)}_(p).

B.2 Per-Tone Burst Chip Equalizers

[0220] Inspired by our related work for the DS-CDMA downlink, we can nowdeal with the design of the chip equalizers. Starting from Equation 50and assuming that the channel matrix {acute over (H)} has full columnrank and the input matrix {acute over (X)} has full row rank, it ispossible to find a Zero-Forcing (ZF) chip equalizer matrix G, for which:

G·Ý−{acute over (X)}=0   (52)

[0221] provided there is no noise present in the output matrix Ý. Sincethe channel matrix {acute over (H)} has a block diagonal structure, asshown in Equation 46, the equalizer matrix G suffices to have a blockdiagonal structure as well: $\begin{matrix}{G:=\begin{bmatrix}G_{1} & \quad & \quad \\\quad & ⋰ & \quad \\\quad & \quad & G_{Q}\end{bmatrix}} & (53)\end{matrix}$

[0222] acting on a per-tone basis. For this reason, the ZF problem ofEquation 52 decouples into Q parallel and independent ZF problems, onefor each tone. Using Equation 51, we can rewrite the original ZF problemof Equation 52 as follows:

G·Ý−Ś _(d)·C_(d)−Ś_(p)·C_(p)=0   (54)

[0223] which is a ZF problem in both the equalizer matrix G and themulti-user total data symbol matrix Ś_(d).

B.2.a Training-Based Burst Chip Equalizer

[0224] The training-based chip equalizer determines its equalizercoefficients from the per-tone output matrix Ý based on the knowledge ofthe pilot code matrix C_(p) and the total pilot symbol matrix Ś_(p). Bydespreading Equation 54 with the pilot code matrix C_(p), we obtain:

G·Ý·C _(p) ^(H) −Ś _(p)=0  (55)

[0225] because of the orthogonality between the multi-user code matrixC_(d) and the pilot code matrix C_(p). In case noise is present in theoutput matrix Ý, we have to solve the corresponding Least Squares (LS)minimisation problem: $\begin{matrix}{\hat{G} = {\arg \quad {\min\limits_{G}{{{G \cdot \overset{'}{Y} \cdot C_{p}^{H}} - {\overset{'}{S}}_{p}}}_{F}^{2}}}} & (56)\end{matrix}$

[0226] which can be interpreted as follows. The equalized output matrixG·Ý is despread with the pilot code matrix C_(p). The equalized outputmatrix after despreading G·ÝC_(p) ^(H) should then be as close aspossible to the known total pilot symbol matrix Ś_(p) in a Least Squaressense.

B.2.b Semi-Blind Burst Chip Equalizer

[0227] The semi-blind chip equalizer determines its equalizercoefficients from the per-tone output matrix Ý based on the knowledge ofthe multi-user code matrix C_(d), the pilot code matrix C_(p) and thetotal pilot symbol matrix Ś_(p). By solving Equation 54 first for Ś_(d),assuming G to be known and fixed, gives {circumflex over ({acute over(S)})}_(d)=G·Ý·C_(d) ^(H). Substituting {circumflex over ({acute over(S)})}_(d) into Equation 54 leads to a semi-blind ZF problem in G only:

G·Ý·(I _(IN) −C _(d) ^(H) ·C _(d))−Ś _(p) ·C _(p)=0  (57)

[0228] In case noise is present in the output matrix Ý, we have to solvethe corresponding LS minimisation problem: $\begin{matrix}{\hat{G} = {\arg \quad {\min\limits_{G}{{{G \cdot \overset{'}{Y} \cdot \left( {I_{IN} - {C_{d}^{H} \cdot C_{d}}} \right)} - {{\overset{'}{S}}_{p} \cdot C_{p}}}}_{F}^{2}}}} & (58)\end{matrix}$

[0229] which can be interpreted as follows. The equalized output matrixG·Ý is projected on the orthogonal complement of the subspace spanned bythe multi-user code matrix C_(d). The equalized output matrix afterprojecting G·Ý(I_(IN)−C_(d) ^(H)·C_(d)) should then be as close aspossible to the known total pilot chip matrix Ś_(p)·C_(p) in a LeastSquares sense.

B.2.c User-Specific Detection

[0230] The obtained per-tone pilot-trained chip equalizer matrix Ĝ,wether training-based or semi-blind, may subsequently be used to extractthe desired user's total data symbol matrix:

Ŝ _(u) ={tilde over (θ)} ^(H·P) _(t)·Ĝ·Ý·C_(u) ^(H)  (59)

[0231] where the equalized output matrix Ĝ·Ý is first despread with thedesired user's code matrix C_(u). Next the transmit permutation matrixP_(t) permutes the per-tone ordering of the despread equalized outputmatrix into a per transmit antenna ordering. Finally, the totalprecoding matrix {tilde over (Θ)} linearly decodes the permuted versionof the despread equalized output matrix, where {tilde over (Θ)} is aM_(t)·Q×M_(t)·B block diagonal matrix with the preceding matrix Θ on itsmain diagonal.

B.3 Simulation Results

[0232] We consider the downlink of a ST coded MC-DS-CDMA system withM_(t)=2 transmit antennas at the base-station, M_(r)=2 receive antennasat the mobile station of interest, QPSK data modulation, an initialblock length of B=13, real orthogonal Walsh-Hadamard spreading codes oflength N=8 along with a random overlay code for scrambling and U=3 (halfsystem load) active user terminals. We assume that each channel K_(m)_(r) _(,m) _(t) [l] is FIR with order L=3. Each channel tap is Rayleighdistributed with equal average power. The precoded block length Q (orequivalently the number of tones) either equals Q=B=13 (without linearprecoding) or Q=B+L=16 (with linear preceding). The length of the cyclicprefix is μ=L=3 and we assume a burst length of M_(t)·B·I=520 (I=20).

[0233]FIG. 17 compares the average BER versus the average SNR per bit ofthe pilot-trained chip equalizers and the ideal fully-trained chipequalizer (CE) for a system without linear preceding. Also shown in thefigure is the theoretical BER-curve for QPSK with M_(t)·M_(r)=4-folddiversity in Rayleigh fading channels (single-user bound). Theperformance of the ideal fully-trained CE perfectly coincides with thesingle-user bound whereas both training-based and semi-blindpilot-trained CE are within 1 dB of the ideal one.

[0234]FIG. 18 shows the same curves but now for a system with linearpreceding, where the preceding matrix Θ constitutes the first B columnsof the FFT matrix F_(Q). The single-user bound now becomes thetheoretical BER-curve for QPSK with M_(t)·M_(r)·(L+1) =16-fold insteadof merely M_(t)·M_(r)4-fold diversity in Rayleigh fading channels. Theperformance of the ideal fully-trained CE is within 1 dB whereas that ofthe training-based pilot-trained CE is within 2 dB of the single-userbound. The semi-blind pilot-trained CE incurs a 0.5 dB loss compared tothe training-based CE because the linear decoding is less robust againstthe orthogonal projection operation in the semi-blind CE.

[0235] We can conclude that the per-tone pilot-trained chip equalizerwith the training-based cost function is a promising technique fordownlink reception in future broadband wireless communication systemsbased on a space-time coded MC-DS-CDMA transmission scheme.

C. Embodiment C.1 SCBT-DS-CDMA Downlink System Model

[0236] Let us consider the downlink of a single-cell space-time blockcoded SCBT-DS-CDMA system with U active mobile stations. Thebase-station is equipped with M_(t)transmit (TX) antennas whereas themobile station of interest is equipped with M_(r) receive (RX) antennas.

C.1.a Transmitter Model for the Base Station

[0237] For simplicity reasons, we will assume in the following that thebase station has only M_(t)=2 transmit antennas. Note however that theproposed techniques can be extended to the more general case of M_(t)>2transmit antennas when resorting to the generalized orthogonal designs.As shown in FIG. 19, each user's data symbol sequence s^(u)[i] (similarfor the pilot symbol sequence s^(p)[i]) is demultiplexed into Mtparallel lower rate sequences {s_(m) _(t)^(u)[i]:=s^(u)iM_(t)+m_(t)−1]}_(m) _(t) ₌₁ ^(M) ^(_(t)) , where M_(t)isthe number of transmit antennas. Each of the u-th user's symbolsequences {s_(m) _(t) ^(u)[i]}_(m) _(t) ₌₁ ^(M) ^(_(t)) isserial-to-parallel converted into blocks of B symbols, leading to thesymbol block sequences{s_(m_(t))^(u)[i]: = [s_(m_(t))^(u)[i  B,  …  , s_(m_(t))^(u)[(i + 1)B − 1]]^(T)}_(m_(t) = 1)^(M_(t))

[0238] that are subsequently spread by a factor N with the same usercomposite code sequence c_(u)[n] which is the multiplication of the userspecific orthogonal Walsh-Hadamard spreading code and the base stationspecific scrambling code. For each of the M_(t)parallel streams, thedifferent user chip block sequences and the pilot chip block sequenceare added, resulting into the m_(t)-th multi-user chip block sequence:$\begin{matrix}{{{x_{m_{t}}\lbrack n\rbrack} = {{\sum\limits_{u = 1}^{U}\quad {{s_{m_{t}}^{u}\lbrack i\rbrack}{c_{u}\lbrack n\rbrack}}} + {{s_{m_{t}}^{p}\lbrack i\rbrack}{c_{p}\lbrack n\rbrack}}}},\quad {i = \left\lfloor \frac{n}{N} \right\rfloor}} & (60)\end{matrix}$

[0239] Let us also define the u-th user's total symbol block sequences^(u)[i] := [s₁^(u)[i]^(T), s₂^(u)[i]^(T)]^(T)

[0240] and the

[0241] total multi-user chip block sequence x[n]:=[x₁ ^(T)[n], x₂^(T)[n]]^(T). The block Space-Time (ST) encoder operates in thetime-domain (TD) at the chip block level rather than at the symbol blocklevel and takes the two multi-user chip blocks {x_(m) _(t) [n]}m _(t) ₌₁² to output the following 2B×2 matrix of ST coded multi-user chipblocks: $\begin{matrix}{\begin{bmatrix}{{\overset{\_}{x}}_{1}\left\lbrack {2n} \right\rbrack} & {{\overset{\_}{x}}_{1}\left\lbrack {{2n} + 1} \right\rbrack} \\{{\overset{\_}{x}}_{2}\left\lbrack {2n} \right\rbrack} & {{\overset{\_}{x}}_{2}\left\lbrack {{2n} + 1} \right\rbrack}\end{bmatrix} = \begin{bmatrix}{x_{1}\lbrack n\rbrack} & {{- P_{B}^{(0)}} \cdot {x_{2}^{*}\lbrack n\rbrack}} \\{x_{2}\lbrack n\rbrack} & {P_{B}^{(0)} \cdot {x_{1}^{*}\lbrack n\rbrack}}\end{bmatrix}} & (61)\end{matrix}$

[0242] where P_(J) ^((j)) is a J×J permutation matrix implementing areversed cyclic shift over j positions. At each time interval n, the STcoded multi-user chip blocks {overscore (x)}₁[n] and {overscore (x)}₂[n] are forwarded to the first respectively the second transmit antenna.From Equation 61, we can easily verify that the transmitted multi-userchip block at time instant 2n+1 from one antenna is the time-reversedconjugate of the transmitted multi-user chip block at time instant 2nfrom the other antenna (with a possible sign change). The K×B transmitmatrix T_(zp), with K=B+μ, pads a zero postfix of length μ to each blockof the ST coded multi-user chip block sequence {overscore (x)}_(m) _(t)[n] leading to the transmitted multi-user chip block sequence u_(m) _(t)[n]=T_(zp)·{overscore (x)}_(m) _(t) [n]. Finally, the transmittedmulti-user chip block sequence u_(m) _(t) [n] is parallel-to-serialconverted into the transmitted multi-user chip sequence [u_(m) _(t) [nK], . . . , u_(m) _(t) [(n+1)K−₁]]^(T):=u_(m) _(t) [n].

C.1.b Receiver Model for the Mobile Station

[0243] We assume that the mobile station of interest is equipped withM_(r) receive antennas and has acquired perfect synchronisation. At eachreceive antenna in FIG. 20, the TD received chip sequence v_(m) _(r) [n]is serial-to-parallel converted into blocks of K chips, resulting intothe TD received chip block sequence v_(m) _(r) [n]:=[v_(m) _(r) [nK], .. . , v_(m) _(r) [(n+1)K−1 ]]^(T). The K×K receive matrix R :=I_(K)completely preserves each block of the TD received chip block sequencev_(m) _(r) [n] leading to the TD received ST coded chip block sequence{overscore (y)}_(m) _(r) [n]=R·v_(m) _(r) [n]. Assuming a sufficientlylong zero postfix μ≧L (L is the maximum channel order), we obtain asimple input/ouput relationship in the time-domain: $\begin{matrix}{{{\overset{\_}{y}}_{m_{r}}\lbrack n\rbrack} = {{\sum\limits_{m_{t} = 1}^{M_{t}}\quad {{\overset{.}{H}}_{m_{r},m_{t}} \cdot T_{zp} \cdot {{\overset{\_}{x}}_{m_{t}}\lbrack n\rbrack}}} + {{\overset{\_}{e}}_{m_{r}}\lbrack n\rbrack}}} & (62)\end{matrix}$

[0244] where {overscore (e)}_(m) _(r) [n] is the TD received noise blocksequence and {dot over (H)}_(m) _(r) _(,m) _(t) is a K×K circulantchannel matrix. We consider two consecutive chip blocks and define y_(m)_(r) _(,1)[n]:={overscore (y)}_(m) _(r) [2n] and y_(m) _(r)_(,2)[n]:=P_(K) ^((B))·{overscore (y)}_(m) _(r) *p8 2n+1]. Transformingy_(m) _(r) _(,1)[n] and y_(m) _(r) _(,2)[n] to the frequency-domain (FD)employing the K×K FFT matrix F_(K) leads to the input/outputrelationship of Equation 63 on the top of the next page, where {tildeover (H)}_(m) _(r) _(,m) _(t) =F_(K)·{dot over (H)}_(m) _(r) _(,m) _(t)·F_(K) ^(H) is the K×K diagonal FD channel matrix having the FD channelresponse {tilde over (h)}_(m) _(r) _(,m) _(t) as its main diagonal. Notefrom Equation 63 that {tilde over (x)}[n]:=F_(K)·T_(zp)·x[n]$\begin{matrix}{\underset{\underset{{{\overset{\sim}{y}}_{m_{r}}{\lbrack n\rbrack}}}{}}{\begin{bmatrix}{F_{K} \cdot {{\overset{\_}{y}}_{m_{r}}\left\lbrack {2n} \right\rbrack}} \\{F_{K} \cdot P_{K}^{(B)} \cdot {{\overset{\_}{y}}_{m_{r}}^{*}\left\lbrack {{2n} + 1} \right\rbrack}}\end{bmatrix}} = {{\underset{\underset{{\overset{\sim}{H}}_{m_{r}}}{}}{\begin{bmatrix}{\overset{\sim}{H}}_{m_{r},1} & {\overset{\sim}{H}}_{m_{r},2} \\{\overset{\sim}{H}}_{m_{r},2}^{*} & {- {\overset{\sim}{H}}_{m_{r},1}^{*}}\end{bmatrix}} \cdot \underset{\underset{{\overset{\sim}{x}{\lbrack n\rbrack}}}{}}{\begin{bmatrix}{F_{K} \cdot T_{zp} \cdot {x_{1}\lbrack n\rbrack}} \\{F_{K} \cdot T_{zp} \cdot {x_{2}\lbrack n\rbrack}}\end{bmatrix}}} + \underset{\underset{{{\overset{\sim}{e}}_{m_{r}}{\lbrack n\rbrack}}}{}}{\begin{bmatrix}{F_{K} \cdot {{\overset{\_}{e}}_{m_{r}}\left\lbrack {2n} \right\rbrack}} \\{F_{K} \cdot P_{K}^{(B)} \cdot {{\overset{\_}{e}}_{m_{r}}^{*}\left\lbrack {{2n} + 1} \right\rbrack}}\end{bmatrix}}}} & (63)\end{matrix}$

[0245] where both the compound FFT matrix f_(K):=diag {F_(K), F_(K)} andthe compound transmit matrix T_(zp):=diag {T_(zp), T_(zp)} are blockdiagonal. Stacking the contributions of the M_(r) receive antennas{tilde over (y)}[n]=[{tilde over (y₁ ^(T)[n], . . . , {tilde over(y)})}_(M) _(r) ^(T)[n]]^(T), we obtain the following per-RX-antenna FDdata model:

{tilde over (y)}[n]={tilde over (H)}·{tilde over (x)}[n]+{tilde over(e)}[n]  (64)

[0246] where the per-RX-antenna channel matrix {tilde over (H)} and theper-RX-antenna noise block {tilde over (e)}[n] are similarly defined asthe per-RX-antenna output block {tilde over (y)}[n]. Defining theper-tone input block {acute over (x)}[n] and the per-tone output blocký[n] as:

{acute over (x)}[n]:=P_(t)·{tilde over (x)}[n]=[{acute over (x)}₁ [n], .. . , {acute over (X)} _(K) [n]] ^(T)

ý[n]:=P_(r) ·{tilde over (y)}[n]=[ý ₁ [n], . . . , ý _(K)[n]]^(T)  (65)

[0247] where P_(t) permutes a per-TX-antenna ordering into a per-toneordering and where P_(r) permutes a per-RX-antenna ordering into aper-tone ordering, we obtain the following per-tone data model:

ý[n]={acute over (H)}·{acute over (x[n]+{acute over (e)})}[n]  (66)

[0248] where é[n] is the per-tone noise block similarly defined as ý[n].The per-tone channel matrix {acute over (H)} is a block diagonal matrix,given by:

{acute over (H)}:=P _(r) ·{tilde over (H)}·P _(t) ^(T)=diag {{acute over(H)} ₁ , . . . , {acute over (H)} _(K)}  (67)

C.1 .c Data Model for Burst Processing

[0249] Assuming a burst length of M_(t)B·I symbols for each user, we canstack I·N consecutive chip blocks {tilde over (y)}[n], defined inEquation 64, into {tilde over (Y)}:=[{tilde over (y)}[0], . . . , {tildeover (y)}[IN−1]], leading to the following per-RX-antenna data model forburst processing:

{tilde over (Y)}={tilde over (H)}·{tilde over (X)}+{tilde over(E)}  (68)

[0250] where the input matrix {tilde over (X)} and the noise matrix{tilde over (E)} are similarly defined as the output matrix {tilde over(Y)}. Note that

{tilde over (X)}=F _(K)·T_(zp)·X  (69)

[0251] where X stacks I·N consecutive total multi-user chip blocks x[n].Moreover, by inspecting Equation 60, we can write X as:

X=S _(d)·C_(d)+S_(p)·C_(p)  (70)

[0252] where the multi-user total data symbol matrix S_(d):=[S₁, . . . ,S_(U)] stacks the total data symbol matrices of the different activeusers and the u-th user's total data symbol matrix S_(u):=[s^(u)[0], . .. , s^(u)[I−1]] stacks I consecutive total symbol blocks for the u-thuser. The total pilot symbol matrix S_(p) is similarly defined as S_(u).The multi-user code matrix C_(d):=[C₁ ^(T), . . . , C_(U) ^(T)]^(T)stacks the code matrices of the different active users. The u-th user'scode matrix stacks the u-th user's composite code vectors at Iconsecutive symbol block instants:

C _(u):=diag{c _(u) ^(T)[0], . . . , c _(u) ^(T) [I−1]}  (71)

[0253] where c_(u)[i]:=[c_(u)[iN], . . . , c_(u)[(i+1)N−1]]^(T) is theu-th user's composite code vector used to spread the total symbol blocks^(u)[i]. The pilot code matrix C_(p) is similarly defined as C^(u).

[0254] Similarly to the per-RX-antenna data model for burst processingin Equation 68, we can stack I·N consecutive chip blocks ý[n] leading tothe following per-tone data model for burst processing:

Ý={acute over (H)}·{acute over (X)}+É  (72)

[0255] Using Equation 65, 69 and 70, we can express {acute over (Xas: )}

{acute over (X)}=Ś _(d)·C_(d)+Ś_(p)·C_(p)  (73)

[0256] where Ś_(d):=P_(t)·{tilde over (S)}_(d) and Ś_(p):=P_(t)·{tildeover (S)}_(p) are the per-tone permuted versions of {tilde over(S)}_(d):=F_(K)·T_(zp)·S_(d) respectively {tilde over(S)}_(p):=F_(K)·T_(zp)·S_(p).

C.2 Burst Frequency-Domain Chip Equalization

[0257] Armed with a suitable data model for burst processing, we can nowproceed with the design of different Least Squares (LS) type of burst FDchip equalizers that processes a burst of M_(t)·I data symbol blocks atonce. Note that Recursive Least Squares (RLS) type of adaptive FD chipequalizers that process the data on a symbol block by symbol block basiscan be easily derived from their corresponding LS burst version.Starting from Equation 72 and assuming the channel matrix {acute over(H)} to have full column rank and the input matrix {acute over (X)} tohave full row rank, it is always possible to find a Zero-Forcing (ZF)chip equalizer matrix {acute over (G)}, for which {acute over(G)}·Ý−{acute over (X)}=0, provided there is no noise present in theoutput matrix Ý. In the presence of noise, we have to solve thecorresponding Least Squares (LS) minimization problem, which we denotefor convenience as: $\begin{matrix}{{{\overset{\prime}{G} \cdot \overset{\prime}{Y}} - \overset{\prime}{X}}\overset{LS}{=}0} & (74)\end{matrix}$

[0258] Since the channel matrix {acute over (H)} has a block diagonalstructure, as shown in Equation 67, the equalizer matrix {acute over(G)} suffices to have a block diagonal structure as well:

{acute over (G)}:=diag {{acute over (G)}₁. . . {acute over(G)}_(K)}  (75)

[0259] acting on a per-tone basis at the chip block level (see also FIG.6). For this reason, the LS problem of Equation 74 decouples into Kparallel and independent LS problems, one for each tone. Using Equation73, we can rewrite the original LS problem of Equation 74 as:$\begin{matrix}{{{\overset{\prime}{G} \cdot \overset{\prime}{Y}} - {{\overset{\prime}{S}}_{d} \cdot C_{d}} - {{\overset{\prime}{S}}_{p} \cdot C_{p}}}\overset{LS}{=}0} & (76)\end{matrix}$

[0260] which is a LS problem in both the equalizer matrix {acute over(G)} and the multi-user total data symbol matrix Ś_(d). Starting fromEquation 76, we will design in the following two different FD methodsfor direct chip equalizer estimation that differ in the amount ofa-priori information they exploit to determine the equalizercoefficients. The first method, coined CDMP-trained, only exploits thepresence of a Code Division Multiplexed Pilot (CDMP). The second method,coined semi-blind CDMP-trained, additionally exploits knowledge of themulti-user code correlation matrix.

C.2.a CDMP-Trained Chip Equalizer

[0261] The CDMP-trained chip equalizer estimator directly determines theequalizer coefficients from the per-tone output matrix Ý based on theknowledge of the pilot code matrix C_(p) and the total pilot symbolmatrix S_(p). By despreading Equation 76 with the pilot code matrixC_(p), we obtain: $\begin{matrix}{{{\overset{\prime}{G} \cdot \overset{\prime}{Y} \cdot C_{p}^{H}} - {\overset{\prime}{S}}_{p}}\overset{LS}{=}0} & (77)\end{matrix}$

[0262] because C_(d)·C_(p) ^(H)=0 due to the orthogonality of the userand pilot composite code sequences at each symbol instant. Equation 77can be interpreted as follows. The equalized per-tone output matrixafter despreading {acute over (G)}·Ý·C_(p) ^(H) should be as close aspossible in a Least Squares sense to the per-tone pilot symbol matrixŚ_(p).

C.2.b Semi-blind CDMP-Trained Chip Equalizer

[0263] The semi-blind CDMP-trained chip equalizer estimator directlydetermines the equalizer coefficients from the per-tone output matrix Ýbased on the knowledge of the multi-user code matrix C_(d), the pilotcode matrix C_(p) and the per-tone pilot symbol matrix Ś_(p). Bydespreading Equation 76 with the multi-user code matrix C_(d) and byassuming the per-tone equalizer matrix {acute over (G)} to be known andfixed, we obtain an LS estimate of the per-tone multi-user data symbolmatrix {circumflex over ({acute over (S)})}_(d)={acute over (G)}·Ý·CS.Substituting {circumflex over ({acute over (S)})}_(d) into the originalLS problem of Equation 76 leads to a modified LS problem in the per-toneequalizer matrix {acute over (G)} only: $\begin{matrix}{{{\overset{\prime}{G} \cdot \overset{\prime}{Y} \cdot \left( {I_{IN} - {C_{d}^{H} \cdot C_{d}}} \right)} - {{\overset{\prime}{S}}_{p} \cdot C_{p}}}\overset{LS}{=}0} & (78)\end{matrix}$

[0264] which can be interpreted as follows. The equalized per-toneoutput matrix {acute over (G)}·Ý is first projected on the orthogonalcomplement of the subspace spanned by the multi-user code matrix C_(d),employing the projection matrix I_(IN)−CS·C_(d). The resulting equalizedper-tone output matrix after projecting should then be as close aspossible in Least Squares sense to the per-tone pilot chip matrixŚ_(p)·C_(p).

C.2.c User-Specific Detection

[0265] As shown in FIG. 6, the obtained per-tone chip equalizer matrix{circumflex over ({acute over (G)})}, whether CDMP-trained or semi-blindCDMP-trained, may subsequently be used to extract the desired user'stotal data symbol matrix: $\begin{matrix}{{\hat{S}}_{u} = {_{z\quad p}^{T} \cdot \mathcal{F}_{K}^{H} \cdot P_{t}^{T} \cdot \hat{\overset{'}{G}} \cdot \overset{'}{Y} \cdot C_{u}^{H}}} & (79)\end{matrix}$

[0266] where the estimated FD input matrix {circumflex over ({tilde over(X)})}=P_(t) ^(T)·{circumflex over ({acute over (G)})}·Ý is transformedto the TD by the compound IFFT matrix F_(K) ^(H) and has its zeropostfix removed by the transpose of the ZP transmit matrix T_(zp). Theresulting estimate of the TD input matrix {circumflex over (X)} isfinally despread with the desired user's code matrix C_(u) to obtain anestimate of the desired user's total data symbol matrix Ŝ_(u).

C.3 Simulation Results

[0267] We consider the downlink of a ST coded CDMA system with M_(t)=2transmit antennas at the base station, M_(r)=2 receive antennas at themobile station of interest and real orthogonal Walsh-Hadamard spreadingcodes of length N=8 along with a random overlay code for scrambling. TheQPSK modulated data symbols are transmitted in bursts of 208 symbols. Weassume that each channel K_(m) _(r) _(,m) _(r) [l] is FIR with order L=3and has Rayleigh distributed channel taps of equal average power. Wecompare two different scenarios:

[0268] S1. A pilot-trained space-time RAKE receiver is applied to thespace-time coded downlink DS-CDMA transmission scheme that was proposedfor the UMTS and the IS-2000 WCDMA standards, also known as Space-TimeSpreading (STS). The pilot-trained space-time RAKE receiver is similarto the time-only RAKE receiver, but instead of using a time-only maximumratio combiner based on exact channel knowledge, we use a space-timecombiner that is trained with the pilot.

[0269] S2. The proposed CDMP-trained and semi-blind CDMP-trainedper-tone space-time chip equalizer methods are applied to the proposedspace-time block coded downlink SCBT-DS-CDMA transmission scheme. Wealso consider the ideal fully-trained (FT) method that assumes perfectknowledge of the per-tone input matrix X and corresponds to Equation 74.The FT method has zero spectral efficiency and is therefore useless inpractice. The burst length of 208 symbols is split into M_(t)·I=16symbol blocks of B=13 symbols each. Taking μ=L=3 and correspondinglyK=B+μ=16, zero-padding results in an acceptable decrease in informationrate, more specifically, a decrease with a factor B/K

0.81.

[0270]FIG. 21 and 22 compare the average BER versus the average receivedSNR of both scenarios for different system loads. Also shown in thefigures is the theoretical BER of QPSK with M_(t)·M_(r)·(L+1)=16-folddiversity in Rayleigh fading channels (single-user bound). FIG. 21compares the performance for half system load, corresponding to U=3active mobile stations. Firstly, we observe that S2 outperforms S1: e.g.at a BER of 10⁻³ the CDMP-trained method achieves a 8 dB gain comparedto S1. The semi-blind CDMP-trained method achieves a 0.5 dB gaincompared to the regular CDMP-trained method and incurs a 3.5 dB losscompared to the ideal FT method. The ideal FT method on its turn closelyapproaches the single-user bound. Thirdly, we observe that S2 comesclose to extracting the fuill diversity of order M_(t)·M_(r)·(L+1). FIG.22 compares the performance for full system load, corresponding to_(u)=7 active mobile stations. Firstly, we observe that S1 now performspoorly compared to S2 : e.g. at a BER of 10⁻² the CDMP-trained methodachieves an 11 dB gain compared to S1. Secondly, the semi-blind and theregular CDMP-trained method now have exactly the same performance (thiscan easily be proven mathematically), and incur a 4 dB loss compared tothe ideal FT method.

C.4 Conclusion

[0271] In this paper, we have combined Single-Carrier Block Transmission(SCBT) DS-CDMA with Time Reversal (TR) Space-Time Block Coding (STBC)for downlink multi-user MIMO communications. Moreover, we have developedtwo new direct equalizer estimation methods that act on a per-tone basisin the Frequency-Domain (FD) exploiting a Code Division MultiplexedPilot (CDMP). The regular CDMP-trained method only exploits the presenceof a CDMP whereas the semi-blind CDMP-trained method additionallycapitalizes on the unused spreading codes in a practical CDMA system.Both the regular and the semi-blind CDMP-trained method come close toextracting the full diversity of order M_(t)·M_(r)·(L+1) independentlyof the system load. The semi-blind CDMP-trained method outperforms theregular CDMP-trained method for low to medium system load and proves itsusefulness especially for small burst lengths.

D. Embodiment D. 1 SCBT-DS-CDMA Downlink System Model

[0272] Let us consider the downlink of a single-cell SCBT-DS-CDMA systemwith U active mobile stations. The base station has a single transmitantenna whereas the mobile station of interest has possibly multiplereceive antennas.

D. 1 .a Transmitter Model for the Base Station

[0273] As shown in FIG. 23, the base station transforms U user datasymbol sequences {s^(u)[i]}_(u=1) ^(U) and a pilot symbol sequences^(p)[i] into a single transmitted chip sequence u[n]. Each user's datasymbol sequence s^(u)[i] (pilot symbol sequence s^(p)[i]) isserial-to-parallel converted into blocks of B symbols, leading to thedata symbol block sequence s^(u)[i]:=[s^(u)[iB], . . . ,s^(u)[(i+1)B−1]]^(T) (pilot symbol block sequence s^(p)[i]). The u-thuser's data symbol block sequence s^(u)[i] (pilot symbol block sequences^(p)[i]) is subsequently spread by a factor N with the user compositecode sequence c_(u)[n] (pilot composite code sequence c_(p)[n]) which isthe multiplication of a user specific (pilot specific) orthogonalWalsh-Hadamard spreading code and a base station specific scramblingcode. The different user chip block sequences and the pilot chip blocksequence are added, resulting into the multi-user chip block sequence:$\begin{matrix}{{{x\lbrack n\rbrack} = {{\sum\limits_{u = 1}^{U}\quad {{s^{u}\lbrack i\rbrack}{c_{u}\lbrack n\rbrack}}} + {{s^{p}\lbrack i\rbrack}{c_{p}\lbrack n\rbrack}}}},\quad {i = \left\lfloor \frac{n}{N} \right\rfloor}} & (80)\end{matrix}$

[0274] The B×1 multi-user chip block sequence x[n] is transformed intothe K×1 transmitted chip block sequence:

u[n]=T ₁ ·x[n]+T ₂ ·b  (81)

[0275] with K=B+μ, where T₁ is the K×B zero padding (ZP) transmit matrixT₁:=[I_(B)0_(Bxμ)]^(T) and T₂ is the K×μ Known Symbol Padding (KSP)transmit matrix T₂:=[0_(μxB)I_(μ)]^(T). Note that this operation adds aμ×1 known symbol postfix b to each block of the multi-user chip blocksequence x[n]. Finally, the transmitted chip block sequence u[n] isparallel-to-serial converted into the corresponding transmitted chipsequence [u[nK], . . . , u[(n+1)K−1]]^(T):=u[n].

D.1.b Receiver Model for the Mobile Station

[0276] We assume that the mobile station of interest is equipped withM_(r) receive antennas and has acquired perfect synchronisation. Asshown in FIG. 24, the mobile station of interest transformsM_(t)received chip sequences {v_(m) _(r) [n]}_(m) _(r) ₌₁ ^(M) ^(_(r))into an estimate of the desired user's data symbol sequence s^(u)[i] (weassume the u-th user to be the desired user). At each receive antenna,the received chip sequence v_(m) _(r) [n] is serial-to-parallelconverted into blocks of K chips, resulting into the received chip blocksequence v_(m) _(r) [n]:=[v_(m) _(r) [nK], . . . , v_(m) _(r)[(n+1)K−1]]^(T). The K×K receive matrix R:=I_(K) completely preserveseach block of the received chip block sequence v_(m) _(r) [n] leading tothe received multi-user chip block sequence y_(m) _(r) [n]:=R·v_(m) _(r)[n]. Assuming a sufficiently long known symbol postfix μ≧L (L is themaximum channel order of all channels), we obtain a simple input/outputrelationship in the time-domain:

y _(m) _(r) [n]={dot over (H)} _(m) _(r) ·(T ₁ ·x[n]+T ₂ ·b)+z _(m) _(r)[n]  (82)

[0277] where z_(m) _(r) [n] is the received noise block sequence and{dot over (H)}_(m) _(r) is a K×K circulant channel matrix describing themulti-path propagation from the base station's transmit antenna to themobile station's m_(r)-th receive antenna. Note that there is no InterBlock Interference (IBI) because b acts as a cyclic prefix for eachtransmitted chip block u[n]. Transforming the time-domain (TD) receivedchip block sequence y_(m) _(r) [n] into the correspondingfrequency-domain (FD) received chip block sequence {tilde over (y)}_(m)_(r) [n]:=F_(K)y_(m) _(r) [n], with F_(K) the K×K FFT matrix, leads tothe following FD input/output relationship:

{tilde over (y)} _(m) _(r) [n]={tilde over (H)} _(m) _(r) ·{tilde over(x)}[n]+{tilde over (z)} _(m) _(r) [n]  (83)

[0278] where {tilde over (z)}_(m) _(r) [n]:=F_(K)z_(m) _(r) [n] is theFD received noise block sequence, {tilde over(x)}[n]:=F_(K)(T₁·x[n]+T₂·b) is the FD transmitted chip block sequenceand {tilde over (H)}_(m) _(r) :=diag{{tilde over (h)}_(m) _(r) } is theK×K diagonal FD channel matrix having the FD channel response {tildeover (h)}_(m) _(r) for the m_(r)-th receive antenna as its maindiagonal. Stacking the FD received chip block sequences of the Mrreceive antennas, we finally obtain the following FD data model:$\begin{matrix}{\underset{\underset{{\overset{\sim}{y}{\lbrack n\rbrack}}}{}}{\begin{bmatrix}{\overset{\sim}{y_{1}}\lbrack n\rbrack} \\\vdots \\{{\overset{\sim}{x}}_{M_{r}}\lbrack n\rbrack}\end{bmatrix}} = {{\underset{\underset{\overset{\sim}{H}}{}}{\begin{bmatrix}{\overset{\sim}{H}}_{1} \\\vdots \\{\overset{\sim}{H}}_{M_{r}}\end{bmatrix}} \cdot {\overset{\sim}{x}\lbrack n\rbrack}} + \underset{\underset{{\overset{\sim}{z}{\lbrack n\rbrack}}}{}}{\begin{bmatrix}{\overset{\sim}{z_{1}}\lbrack n\rbrack} \\\vdots \\{{\overset{\sim}{z}}_{M_{r}}\lbrack n\rbrack}\end{bmatrix}}}} & (84)\end{matrix}$

D. 1 .c Data Model for Burst Processing

[0279] Assuming a burst length of I·B data symbols for each user, we canstack I·N consecutive FD received chip blocks {tilde over (y)}[n],defined in Equation 84, into a FD output matrix {tilde over(Y)}:=[{tilde over (y)}[0], . . . , {tilde over (y)}[IN−1]], leading tothe following FD data model for burst processing:

{tilde over (Y)}={tilde over (H)}·{tilde over (X)}+{tilde over(Z)}  (85)

[0280] where the FD input matrix {tilde over (X)} and the FD noisematrix {tilde over (Z)} are similarly defined as the FD output matrix{tilde over (Y)}. Note from Equation 83 that:

{tilde over (X)}=F _(K)·(T ₁ ·X+T ₂ ·B)  (86)

[0281] where the TD input matrix X stacks I·N consecutive multi-userchip blocks x[n] and the KSP matrix B repeats I·N times the known symbolpostfix b. By inspecting Equation 80, we can also write X as follows:

X=S _(d) ·C _(d) +S _(p) ·C _(p)  (87)

[0282] where the multi-user data symbol matrix S_(d):=[S₁, . . . ,S_(U)] stacks the data symbol matrices of the different active users andthe u-th user's data symbol matrix S_(u):=[s^(u)[0], . . . , s^(u)[I−1]]stacks I consecutive data symbol blocks of the u-th user. The pilotsymbol matrix S_(p) is similarly defined as S_(u). The multi-user codematrix C_(d):=[C₁ ^(T), . . . , C_(U) ^(T)]^(T) stacks the code matricesof the different active users. The u-th user's code matrix C_(u):=diag{c_(u)[0], . . . , c_(u)[I−1]} stacks the u-th user's composite codevectors at I consecutive symbol instants, and the _(u)-th user'scomposite code vector c_(u)[i]:=[c_(u)[iN], . . . , c_(u)[(i+1)N−1]] isused to spread the data symbol block s_(u)[i]. The pilot code matrixC_(p) and the pilot composite code vector c_(p)[i] are similarly definedas C_(u) respectively C_(u)[i].

D.2 Burst Frequency-Domain Chip Equalization

[0283] Armed with a suitable data model for burst processing, we can nowproceed with the design of different Least Squares (LS) type of burstfrequency-domain (FD) chip equalizers that process a burst of I datasymbol blocks at once. Note that Recursive Least Squares (RLS) type ofadaptive FD chip equalizers that process the data on a symbol block bysymbol block basis can be easily derived from their corresponding LSburst version. Starting from Equation 85 and assuming the total FDchannel matrix {tilde over (H)} to have full column rank and the FDinput matrix {tilde over (X)} to have full row rank, it is alwayspossible to find a Zero-Forcing (ZF) FD chip equalizer matrix {tildeover (G)}, for which {tilde over (G)}·{tilde over (Y)}−{tilde over(X)}=0, provided there is no noise present in the FD output matrix{tilde over (Y)}. In the presence of noise, we have to solve thecorresponding Least Squares (LS) minimization problem, which we denotefor convenience as:

{tilde over (G)}·{tilde over (Y)}−{tilde over (X)}

0  (88)

[0284] Since the total FD channel matrix {tilde over (H)} stacks M_(r)diagonal FD channel matrices, as indicated by Equations 83 and 84, thetotal FD equalizer matrix {tilde over (G)} suffices to have a similarstructure:

{tilde over (G)}:=[{tilde over (G)} ₁ . . . {tilde over (G)} _(M) _(r)]  (89)

[0285] where the FD equalizer matrix for the m_(r)-th receive antenna{tilde over (G)}_(m) _(r) :=diag {{tilde over (g)}_(m) _(r) } acts on aper-tone basis at the chip block level (see also FIG. 10). UsingEquations 86 and 87, we can rewrite the original LS problem of Equation88 as:

{tilde over (G)}·{tilde over (Y)}−{tilde over (S)} _(d) C _(d) −{tildeover (S)} _(p) ·{tilde over (C)} _(p) −F _(K) ·T ₂

·B

0   (90)

[0286] where the FD multi-user data symbol matrix {tilde over (S)}_(d)and the FD pilot symbol matrix {tilde over (S)}_(p) are defined as:

{tilde over (S)} _(d) :=F _(K) ·T ₁ ·S _(d) {tilde over (S)} _(p) :=F_(K) ·T ₁ ·S _(p)  (91)

[0287] Starting from Equation 90, we will design in the following threedifferent FD methods for direct chip equalizer estimation that differ inthe amount of a-priori information they exploit to determine theequalizer coefficients. The first method, coined KSP-trained, onlyexploits the presence of a known symbol postfix. The last two methods,coined joint CDMP/KSP-trained and semi-blind joint CDMP/KSP-trained,exploit the presence of both a known symbol postfix and a Code DivisionMultiplexed Pilot (CDMP).

D.2.a KSP-Trained Chip Equalizer

[0288] The KSP-trained chip equalizer estimator directly determines theequalizer coefficients from the FD output matrix {tilde over (Y)} basedon the knowledge of the KSP matrix B. By transforming Equation 90 to theTD with the IFFT matrix F_(K) ^(H) and by selecting the known symbolpostfix with the KSP transmit matrix T₂, we obtain:

T·F _(K) ^(H)

·{tilde over (G)}·{tilde over (Y)}−B

0  (92)

[0289] because T₂ ^(T)·T₁=0_(μxB) and T₂ ^(T)·T₂=I_(μ). Using thedefinition of T₂ in Equation 81, we can rewrite Equation 92 as:

F _(K) ^(H)(B+1: K,:)·

{tilde over (G)}·{tilde over (Y)}−B

0   (93)

[0290] which can be interpreted as follows. The equalized FD outputmatrix {tilde over (G)}·{tilde over (Y)} is transformed to the TD withthe last μ rows of the IFFT matrix F_(K) ^(H). The resulting matrixshould be as close as possible to the KSP matrix B in a Least Squaressense.

D.2.b Joint CDMP/KSP-Trained Chip Equalizer

[0291] The joint CDMP/KSP-trained chip equalizer estimator directlydetermines the equalizer coefficients from the FD output matrix {tildeover (Y)} based on the knowledge of the pilot code matrix C_(p), thepilot symbol matrix S_(p) and the KSP matrix B. By despreading Equation90 with the pilot code matrix C_(p), we obtain:

{tilde over (G)}·{tilde over (Y)}·C _(p) ^(H) −F _(K)·(T ₁ ·S _(p) +T ₂·B·C _(p) ^(H))

0   (94)

[0292] because C_(p)·C_(p) ^(H)=0 due to the orthogonality of the userand the pilot composite code sequences at each symbol block instant.Equation 94 can be interpreted as follows. The equalized FD outputmatrix after despreading {tilde over (G)}{tilde over (Y)}·C_(p) ^(H)should be as close as possible in a Least Squares sense to the FDversion of the pilot symbol matrix S_(p) padded with the KSP matrixafter despreading B·C_(p) ^(H).

D.2.c Semi-Blind Joint CDMP/KSP-Trained Chip Equalizer

[0293] The semi-blindjoint CDMP/KSP-trained chip equalizer estimatordirectly determines the equalizer coefficients from the FD output matrix{tilde over (Y)} based on the knowledge of the multi-user code matrixC_(d), the pilot code matrix C_(p), the pilot symbol matrix S_(p) andthe KSP matrix B. By despreading Equation 90 with the multi-user codematrix C_(d) and by assuming the FD equalizer matrix {tilde over (G)} tobe known and fixed, we obtain an LS estimate of the multi-user datasymbol matrix {tilde over (S)}_(d):

{circumflex over ({tilde over (S)})} _(d) ={tilde over (G)}·{tilde over(Y)}·C _(d) ^(H) −F _(K) ·T ₂ ·B·C _(d) ^(H)  (95)

[0294] because C_(p)·C_(d) ^(H)=0_(I×UI) due to the orthogonality of thepilot and user composite code sequences at each symbol block instant.Substituting {circumflex over ({tilde over (S)})}_(d) into the originalLS problem of Equation 90 leads to:

{tilde over (G)}·{tilde over (Y)}·P _(d) −F _(K)·(T ₁ ·S _(p) ·C _(p) +T₂ ·B·P _(d))

0  (96)

[0295] where the projection matrix P_(d) is defined as:

P _(d) :=I _(IN) −C _(d) ^(H) ·C _(d)  (97)

[0296] Equation 96 can be interpreted as follows. Both the FD outputmatrix {tilde over (Y)} and the KSP matrix B are projected on theorthogonal complement of the subspace spanned by the multi-user codematrix C_(d), employing the projection matrix P_(d). The equalized FDoutput matrix after projecting {tilde over (G)}·{tilde over (Y)}·P_(d)should then be as close as possible in a Least Squares sense to the FDversion of the pilot chip matrix S_(p)·C_(p) padded with the KSP matrixafter projection B·P_(d).

D.2.d User-Specific Detection

[0297] As shown in FIG. 10, the obtained FD chip equalizer matrix{circumflex over ({tilde over (G)})}, whether KSP-trained, jointCDMP/KSP-trained or semi-blind joint CDMP/KSP-trained, may subsequentlybe used to extract the desired user's data symbol matrix:$\begin{matrix}{{\hat{S}}_{u} = {\underset{\underset{\quad}{}}{T_{1}^{T} \cdot F_{K}^{H} \cdot \underset{\underset{\quad}{}}{\overset{\hat{\sim}}{G} \cdot \overset{\sim}{Y}}} \cdot \underset{{\overset{\hat{\sim}}{x}\quad \hat{x}}\quad}{C_{u}^{H}}}} & (98)\end{matrix}$

[0298] where the estimated FD input matrix {circumflex over ({tilde over(X)})} is transformed to the TD by the IFFT matrix F_(K) ^(H) and hasits known symbol postfix removed by the ZP transmit matrix T₁. Theresulting estimate of the TD input matrix {circumflex over (X)} isfinally despread with the desired user's code matrix C_(u) to obtain anestimate of the desired user's data symbol matrix.

D.3 Simulation Results

[0299] We consider the downlink of an SCBT-DS-CDMA system (see also FIG.9) with a single transmit antenna at the base-station, M_(r)=1 receiveantenna(s) at the mobile station of interest, QPSK data modulation, aninitial block length of B=13, real orthogonal Walsh-Hadamard spreadingcodes of length B=8 along with a complex random overlay code forscrambling and U=3 active mobile stations in total (half system load).We assume that each channel h_(m) _(r) [l] is FIR with order L=3 and hasRayleigh distributed channel taps of equal average power. Taking μ=L=3and correspondingly K=B+μ=16, KSP results in an acceptable decrease ininformation rate, more specifically, a decrease with a factor B/K

0.81. Both FIG. 11 and FIG. 12 compare the average BER versus theaverage received SNR per bit of the proposed methods with that of theideal fully-trained (FT) method that assumes perfect knowledge of the FDinput matrix {tilde over (X)} (see Equation 88). Also shown in thefigures is the theoretical BER-curve for QPSK with M_(r)·(L+1)=4-folddiversity in Rayleigh fading channels (single-user bound).

[0300]FIG. 25 compares the average BER versus the average received SNRfor a large burst length I·B=52 corresponding to I=4. The jointCDMP/KSP-trained method and its semi-blind (SB) version have similarperformance except at high SNR per bit where the semi-blind method has aslight advantage. At a BER of 10⁻², they outperform the KSP-trainedmethod by 4 dB and are within 2 dB of the ideal FT method.

[0301]FIG. 26 compares the average BER versus the average received SNRfor a small burst length I·B=13 corresponding to I=1. The semi-blind(SB) joint CDMP/KSP-trained method now outperforms both the regularjoint CDMP/KSP-trained method and the KSP-trained method. At a BER of10⁻¹, it achieves a 6 dB gain compared to the regular jointCDMP/KSP-trained method and incurs a 5 dB loss compared to the ideal FTmethod. The joint CDMP/KSP-trained method outperforms on its turn theKSP-trained method by 8 dB at a BER of 10⁻¹.

D.4 Conclusion

[0302] In this paper, we have developed three new direct equalizerestimation methods for single-carrier block transmission (SCBT) DS-CDMAwith Known Symbol Padding (KSP). The KSP-trained method, that onlyexploits the presence of the known symbol postfix, only achievesreasonable performance for rather large burst lengths. The jointCDMP/KSP-trained method, that additionally exploits the presence of aCode Division Multiplexed Pilot (CDMP), outperforms the KSP-trainedmethod for both large and small burst lengths. The semi-blind jointCDMP/KSP-trained method, that additionally capitalizes on the unusedspreading codes in a practical CDMA system (assuming knowledge of themulti-user code correlation matrix) proves its usefulness especially forsmall burst lengths. It outperforms the regular joint CDMP/KSP-trainedmethod while staying within reasonable range of the ideal fully-trainedmethod.

[0303] We can conclude that, from a performance point of view, thesemi-blind joint CDMP/KSP-trained method for direct equalizer estimationis an interesting technique for future broadband cellular systems basedon SCBT-DS-CDMA.

E. Embodiment E.1 System Model for DS-CDMA Downlink with SpatialMultiplexing

[0304] Let us consider the downlink of a single-cell SM DS-CDMA systemwith U active mobile stations. We assume that the base station isequipped with M_(T) transmit antennas whereas the mobile station ofinterest is equipped with MR receive antennas. As shown in FIG. 27, thebase-station transmits from each of its M_(T) antennas a synchronouscode division multiplex, employing the same user specific orthogonalWalsh-Hadamard spreading codes and a different base station specificscrambling code. The multi-user chip sequence, transmitted by them_(t)-th transmit antenna, consists of U active user signals and acontinuous pilot signal: $\begin{matrix}{{x_{m_{t}}\lbrack n\rbrack} = {{\sum\limits_{u = 1}^{U}\quad {{s_{m_{t}}^{u}\lbrack i\rbrack}{c_{m_{t}}^{u}\lbrack n\rbrack}}} + {{s_{m_{t}}^{p}\lbrack i\rbrack}{c_{m_{t}}^{p}\lbrack n\rbrack}}}} & (99)\end{matrix}$

[0305] with i=└n/N┘. Each user's data symbol sequence s_(m) _(t) ^(u)[i](pilot symbol sequence s_(m) _(t) ^(p)[i]) is spread by a factor N withthe user composite code sequence c_(m) _(t) ^(u)[n] (pilot compositecode sequence c_(m) _(t) ^(p)[n]). The u-th user composite code sequencefor the m_(t)-th transmit antenna c_(m) _(t) ^(u)[n] (the pilotcomposite code sequence c_(m) _(t) ^(p)[n]) is the multiplication of theuser specific Walsh-Hadamard spreading code sequence ś^(u)[n] (the pilotspecifc spreading code sequence ś^(p)[n]) and the base station'stransmit antenna specific scrambling code sequence c_(m) _(t) ^(s)[n].

[0306] Assume that the mobile station is equipped with M_(R) receiveantennas and let h_(m) _(r) _(,m) _(t) (t) denote the continuous-timechannel from the m_(t)-th transmit antenna to the m_(r)-th receiveantenna, including the transmit and receive filters. The receivedantenna signals are sampled at the chip rate T_(c) and the obtainedsamples are stacked in the M_(R)×1 received vector sequence${{y\lbrack n\rbrack} = \begin{bmatrix}\quad & \quad & \quad & \quad \\{y_{1}\lbrack n\rbrack} & {y_{2}\lbrack n\rbrack} & \ldots & {y_{M_{R}}\lbrack n\rbrack}\end{bmatrix}^{T}},$

[0307] which can be written as: $\begin{matrix}{{y\lbrack n\rbrack} = {{\sum\limits_{m_{t} = 1}^{M_{T}}\quad {\sum\limits_{n^{\prime} = 0}^{L_{m_{t}}}\quad {{h_{m_{t}}\left\lbrack n^{\prime} \right\rbrack}{x_{m_{t}}\left\lbrack {n - n^{\prime}} \right\rbrack}}}} + {e\lbrack n\rbrack}}} & (100)\end{matrix}$

[0308] where e[n] is the M_(R)×1 received noise vector sequence andh_(m) _(t) [n] is the discrete-time M_(R)×1 vector channel from them_(t)-th transmit antenna to the M_(R) receive antennas. Note that wemodel h_(m) _(t) [n] as an M_(R)×1 FIR vector filter of order L_(m) _(t).

[0309] With M_(T)I the burst length, I the number of data symbols pertransmit antenna and Q+1 the temporal smoothing factor, we now introducethe following (Q+1) M_(R)×IN output matrix: $\begin{matrix}{\underset{\underset{Y}{}}{\begin{bmatrix}{y\lbrack 0\rbrack} & \cdots & {y\left\lbrack {{IN} - 1} \right\rbrack} \\\vdots & \quad & \vdots \\{y\lbrack Q\rbrack} & \cdots & {y\left\lbrack {Q + {IN} - 1} \right\rbrack}\end{bmatrix}} = {{\sum\limits_{m_{t} = 1}^{M_{T}}\quad {\mathcal{H}_{m_{t}} \cdot \underset{\underset{x_{m_{t}}}{}}{\begin{bmatrix}x_{m_{t}}^{- L_{m_{t}}} \\\vdots \\x_{m_{t}}^{Q}\end{bmatrix}}}} + E}} & (101)\end{matrix}$

[0310] where the noise matrix E is similarly defined as Y, H_(m) _(t) isthe (Q+1)M_(R)×r_(m) _(t) channel matrix (with block Toeplitz structure)and X_(m) _(t) is the r_(m) _(t) ×IN input matrix for the mt-th transmitantenna$\left( {r_{m_{t}}:={{L_{m_{t}} + 1 + {Q\quad {and}\quad r}}:={\sum\limits_{m_{t} = 1}^{M_{T}}\quad {r_{m_{t}}\quad {is}\quad {called}\quad {the}\quad {system}\quad {order}}}}} \right).$

[0311] is called the system order). The multi-user chip sequencetransmitted from the mt-th antenna at delay a is defined as$x_{m_{t}}^{a}:={\begin{bmatrix}\quad & \quad & \quad & \quad \\{x_{m_{t}}\lbrack a\rbrack} & {x_{m_{t}}\left\lbrack {a + 1} \right\rbrack} & \ldots & {x_{m_{t}}\left\lbrack {a + {IN} - 1} \right\rbrack}\end{bmatrix}.}$

[0312] Armed with a suitable data model for burst processing, we can nowproceed with the design of the new combined linear and non-lineardownlink multi-user receiver. As shown in FIG. 28, it consits of

{circumflex over (x)} _(m) _(t) ₍₀₎ =g _(m) _(t) ·Y  (102)

[0313] where the 1×(Q+1)M_(T) equalizer vector gnt makes a linearcombination of the rows of the output matrix Y and where {circumflexover (x)}_(m) _(r) ₍₀₎ is the initial soft estimate of x _(t) :=x⁰ _(m)_(t) being the multi-user chip sequence transmitted from the m_(t)-thtransmit antenna at delay α=0. The equalizer vector can be directlyestimated from the output matrix by exploiting the presence of thecode-multiplexed pilot in either a training-based or a semi-blind LeastSquares (LS) cost function.

[0314] The bank of correlators descrambles and despreads the equalizedsignal with all active user composite code sequences for the m_(t)-thtransmit antenna to obtain the initial soft decisions: $\begin{matrix}{{\hat{s}}_{m_{t}{(0)}}^{d} = {{\hat{x}}_{m_{t}} \cdot C_{m_{t}}^{dH}}} & (103)\end{matrix}$

[0315] where ŝ_(m) _(t) ₍₀₎ ^(d) is the initial soft estimate of the1×UI multi-user data symbol vector for the m_(t)-th transmit antenna$s_{m_{t}}^{d}:=\left\lfloor \begin{matrix}\quad & \quad & \quad \\s_{m_{t}}^{1} & \cdots & s_{m_{t}}^{U}\end{matrix} \right\rfloor$

[0316] that stacks the data symbol vectors of the different active usersat the corresponding transmit antenna. The UI×IN multi-user code matrix$C_{m_{t}}^{d}:=\begin{bmatrix}\quad & \quad & \quad \\C_{m_{t}}^{1T} & \cdots & C_{m_{t}}^{UT}\end{bmatrix}^{T}$

[0317] for the m_(t)-th transmit antenna stacks the code matrices of thedifferent active users. The initial soft decisions ŝ_(m) _(t) ₍₀₎ ^(d)are then input to a decision device that generates the initial harddecisions {circumflex over (s)}_(m) _(t) ₍₀₎ ^(d).

E.3 k-th Non-Linear ST-PIC/RAKE Stage

[0318] As shown in FIG. 28, the k-th ST-PIC/RAKE stage uses theestimates of the multi-user data symbol vectors$\left\{ {\hat{\underset{\_}{s}}}_{m_{t}{({k - 1})}}^{d} \right\}_{m_{t} = 1}^{M_{T}}$

[0319] provided by the previous stage (wether the previous ST-PIC/RAKEstage when k>1 or the initial linear stage when k=1) to obtain therefined estimates$\left\{ {\hat{\underset{\_}{s}}}_{m_{t}{(k)}}^{d} \right\}_{m_{t} = 1}^{M_{T}}.$

[0320] Each ST-PIC/RAKE stage decomposes into a non-linear ST ParallelInterference Cancellation (PIC) step and a linear ST-RAKE combiningstep. FIG. 30 shows the k-th ST-PIC/RAKE stage focused on a particulartransmit stream j of a particular user l. The ST-PIC step firstregenerates the MUI experienced by the j-th transmit stream of the l-thuser at each receive antenna: $\begin{matrix}{Z_{j{(k)}}^{l} = {\sum\limits_{m_{t} = 1}^{M_{T}}\quad {{\hat{\mathcal{H}}}_{m_{t}} \cdot {\hat{\underset{\_}{X}}}_{m_{t}{(k)}}}}} & (104)\end{matrix}$

[0321] where the MUI matrix for the j-th transmit stream of the l-thuser Z_(j(k)) ^(l) is similarly defined as the output matrix Y with Q=L.The hard estimate of the m_(t)-th input matrix {circumflex over (X)}_(m)_(t) _((k)) is similarly defined as X_(m) _(t) except for the j-th one{circumflex over (X)}_(j(k)) that does not include the contribution ofthe l-th user (see FIG. 16). The output matrix after cancellation forthe j-th transmit stream of the l-th user then becomes: $\begin{matrix}{Y_{j{(k)}}^{l} = {Y_{r} - Z_{j{(k)}}^{l}}} & (105)\end{matrix}$

[0322] where Y_(j(k)) ^(l) and Y_(r) are similarly defined as Y withQ=L.

[0323] The ST-RAKE combining step performs ST Maximum Ratio Combining(MRC) at the chip-level (CL) followed by a correlation with thecomposite code sequence of the j-th transmit stream of the l-th user:$\begin{matrix}{{\hat{s}}_{j{(k)}}^{l} = {f \cdot Y_{j{(k)}}^{l} \cdot C_{j}^{l\quad H}}} & (106)\end{matrix}$

[0324] where the 1×(L+1)M_(T)MRC vector f contains the complex conjugateST channel coefficients. The obtained k-th soft estimate about the j-thdata symbol vector of the l-th user ŝ_(j(k)) ^(l) is then fed into adecision device that generates the k-th hard estimate {circumflex over(s)}_(j(k)) ^(l).

E.4 Simulation Results

[0325] The simulations are performed for the downlink of a spatiallymultiplexed DS-CDMA system with U=3 active, equal power users, QPSK datamodulation, real orthogonal Walsh-Hadamard spreading codes of length N=8along with a complex random overlay code for scrambling. The number oftransmit antennas at the base station is M_(T)=3 whereas the number ofreceive antennas at the mobile station is M_(T)=4. The block length isI=50. The ST vector channels with order L_(m) _(t) =L=3 haveM_(T)(L+1)=16 Rayleigh distributed channel taps of equal average power.The temporal smoothing factor is chosen to be Q+1=M _(T)L=9 and theperformance is averaged over 100 channels in total.

[0326] The proposed receiver with only one ST-PIC/RAKE stage achieves asignificant performance improvement compared to the linear onlyreceiver. This is shown in FIG. 31 that compares the average BER versusthe SNR per bit of the linear stage only to the complete receiver withone ST-PIC/RAKE stage. The initial linear stage is either the idealST-RAKE receiver with perfect channel knowledge, the pilot-trained(PT-ST-CLEQ), the semi-blind (SB-ST-CLEQ) or the fully-trainedspace-time chip-level equalizer receiver (FT-ST-CLEQ). Also shown in thefigure is the theoretical BER-curve of QPSK with M_(T)(L+1)-th orderdiversity in Rayleigh fading channels (single-user bound). TheSB-ST-CLEQ for instance with one ST-PIC/RAKE stage outperforms theSB-ST-CLEQ only receiver with 6.5 dB at a BER of 10⁻³.

[0327] The proposed receiver with two successive ST-PIC/RAKE stagesgives an additional gain of 1.5 dB for the SB-ST-CLEQ and 3.8 dB for thePT-ST-CLEQ. This is shown in FIG. 32.

E.5 Conclusion and Future Work

[0328] We have presented a new combined linear and non-linear downlinkmulti-user receiver that copes with the M_(T)-fold increase of the MUIin a SM DS-CDMA downlink that uses M_(T) transmit antennas. The proposedreceiver consists of an initial linear stage based on space-timechip-level equalization and possibly multiple non-linear ST-PIC/RAKEstages. Simulation results show that a single ST-PIC/RAKE stagesignificantly increases the performance of the linear only receiver. Asecond ST-PIC/RAKE stage brings an additional gain at the expense ofincreased complexity. We can conclude that the proposed downlinkmulti-user receiver is a promising issue for future MIMO DS-CDMA systemsboth from performance as well as complexity point of view.

What is claimed is:
 1. A method of communicating data signals among aplurality of users, the method comprising: spreading and scrambling atleast a portion of a block of data obtainable by grouping of datasymbols using a serial-to-parallel operation; combining the spread andscrambled portions of said blocks associated with at least two users;adding redundant data to said combined spread and scrambled portions;and transmitting said combined spread and scrambled portions having saidredundant data.
 2. The method recited in claim 1, wherein said spreadingand scrambling comprise using a code sequence obtained by multiplying auser terminal-specific code and a base station specific scrambling code.3. The method recited in claim 1, further comprising the step ofgenerating a plurality of independent block portions.
 4. The methodrecited in claim 1, further comprising generating a plurality of blockportions.
 5. The method recited in claim 3, wherein the steps of claim 1are performed repeatedly to process substantially all of the blockportions, thereby generating streams comprising a plurality of combinedspread and scrambled block portions.
 6. The method recited in claim 5,further comprising encoding each of said streams after the step ofcombining but before the step of transmitting said spread and scrambledportions.
 7. The method recited in claim 5, further comprising encodingsaid streams in the space-time domain after performing the step ofcombining and before performing the step of transmitting said spread andscrambled portions, thereby combining information from at least two ofsaid streams.
 8. The method recited in claim 7, wherein said step ofspace-time encoding said streams comprises performing at least one ofblock space-time encoding and trellis space-time encoding.
 9. The methodrecited in claim 1, further comprising performing inverse subbandprocessing after performing the step of combining and before performingthe step of transmitting said spread and scrambled portions.
 10. Themethod recited in claim 1, further comprising performing linearpreceding.
 11. The method recited in claim 1, comprising the step ofapplying adaptive loading for each user terminal.
 12. The method recitedin claim 1, wherein the step of combining spread and scrambled blockportions comprises adding a pilot signal to the sum of spread andscrambled block portions.
 13. The method recited in claim 1, wherein thestep of adding redundant data comprises the addition of at least one ofa cyclic prefix, a zero postfix, and a symbol postfix.
 14. The methodrecited in claim 1, wherein a transmit system for wireless communicationis configured to perform the steps among the plurality of users.
 15. Themethod recited in claim 1, further comprising performing a processingtechnique configured to improve integrity of said spread and scrambledportions in the presence of frequency-selective fading.
 16. An apparatusfor communicating data signals among a plurality of users, the apparatuscomprising: a grouping circuit configured to group data symbols fortransmission; a spreader and scrambling circuit configured to spreadand/or scramble said grouped data symbols; an adder configured to addredundant data to said spread and scrambled grouped data symbols. 17.The apparatus recited in claim 16, further comprising at least onetransmit antenna configured to transmit said spread and scrambledgrouped data symbols having said redundancy.
 18. The apparatus recitedin claim 16, further comprising a circuit configured to improveintegrity of said grouped data symbols in the presence offrequency-selective fading.
 19. The apparatus recited in claim 16,further comprising a space-time encoder.
 20. The apparatus recited inclaim 16, further comprising one or more circuits configured to performinverse subband processing on said grouped data symbols.
 21. A methodfor communicating data signals between at least one base station and atleast one terminal, the method comprising: receiving a signal from atleast one antenna; performing subband processing of a version of saidreceived signal; separating contribution signals associated with aplurality of users, the contribution signals being derivable from saidreceived signal; and restoring data from each one of the contributionsignals, the data having been configured to resist frequency-selectivefading.
 22. The method recited in claim 21, wherein the step ofseparating the contributions comprises filtering at a chip rate at leasta portion of the subband processed version of said received signal, andthen despreading said filtered portion.
 23. The method recited in claim21, wherein the step of separating the contributions comprisesdespreading at least a portion of the subband processed version of saidreceived signal, and then filtering said despread portion.
 24. Themethod recited in claim 21, wherein the step of receiving a signalcomprises receiving the signal from a plurality of antennas, andgenerating data streams, and wherein the step of subband processing isperformed on each of said data streams, and producing a subbandprocessed version of said received signal.
 25. The method recited inclaim 21, further comprising performing the step of space-time decodingon each of the streams.
 26. The method recited in claim 25, wherein thestep of space-time decoding comprises performing at least one of blockdecoding and trellis decoding.
 27. The method recited in claim 22,further comprising performing the step of inverse subband processing onat least one filtered, subband processed version of the received signal.28. The method recited in claim 22, wherein the step of filteringcomprises processing said portion with a filter characterized by filtercoefficients that are determinable in a semi-blind or in atraining-based technique.
 29. The method recited in claim 22, whereinthe step of filtering comprises processing said portion with a filtercharacterized by filter coefficients that are determinable without thenecessity to perform channel estimation.
 30. The method recited in 22,wherein the step of filtering comprises processing said portion with afilter characterized by filter coefficients that are determined whilemaintaining one version of the filtered signal as close as possible to aversion of the pilot symbol.
 31. The method recited in claim 30, whereinthe version of the filtered signal comprises the filtered signal afterdespreading with a composite code associated with the basestation-specific scrambling code and the pilot code, and wherein theversion of the pilot symbol comprises the pilot symbol as formulated ina per tone ordering.
 32. The method recited in claim 30, wherein theversion of the filtered signal comprises the filtered signal afterprojecting on an orthogonal complement on a subspace spanned by aplurality of composite codes associated with the base station-specificscrambling code and user-specific codes, and wherein the version of thepilot symbol comprises the pilot symbol spread with a composite codeassociated with the base station-specific scrambling code and the pilotcode as put in per tone ordering.
 33. The method recited in claim 21,further comprising removing transmit redundancy resulting from addingredundant data.
 34. The method recited in claim 21, wherein the step ofrestoring data comprises performing the step of linear de-precoding. 35.The method recited in claim 21, wherein a receiver is configured toperform all said steps among a plurality of users in a wirelesscommunication system.
 36. An apparatus for communicating data signalsamong a plurality of users, the apparatus comprising: a plurality ofcircuits configured to perform subband processing on received signals; adespreader circuit configured to determine an estimate of a plurality ofsubband processed symbols received for at least one user terminal fromthe received signals.
 37. The apparatus recited in claim 36, furthercomprising a plurality of antennas receiving signals.
 38. The apparatusrecited in claim 36, wherein said circuitry adapted for determining anestimate of symbols comprises a plurality of circuits for inversesubband processing.
 39. The apparatus recited in claim 36, wherein saiddespreader circuit comprises a plurality of filters configured to filterat least a portion of a subband processed version of said receivedsignals.
 40. The apparatus recited in claim 36, wherein said despreadercircuit comprises a plurality of filters configured to filter at a chiprate at least a portion of a subband processed version of said receivedsignals.
 41. The apparatus recited in claim 36, further comprising aspace-time decoder configured to decode at least a portion of saidreceived signals.
 42. A method of communicating data signals among aplurality of users, the method comprising: transforming data symbolsassociated with the plurality of users into respective user sequences;performing linear precoding on each user sequence; converting each usersequence into data blocks; spreading each user data block with auser-specific composite code sequence derived from the multiplication ofan orthogonal spreading code specific to said user and a basestation-specific scrambling code; adding the spread block sequences ofall users; and performing an inverse transform on the added blocksequences into the time domain.
 43. A method of communicating datasignals among a plurality of users, the method comprising: convertingdata symbols associated with the plurality of users into respective userdata blocks; linearly preceding said data blocks; placing said precededdata blocks onto a plurality of carrier signals; demultiplexing saidprecoded data blocks into a plurality of sequences; spreading eachdemultiplexed sequence with a user code sequence, which is derivablefrom the multiplication of a respective user-specific orthogonalspreading code by a base station-specific scrambling code; adding allspread sequences to a pilot block sequence to produce a sum sequence;and performing a space-time encoding on the sum sequence.
 44. A methodof communicating data signals among a plurality of users, the methodcomprising: demultiplexing data symbol sequence into several datastreams; converting said data streams into respective symbol blocks;spreading said symbol blocks with a user composite code sequence, whichis derivable from the multiplication of a user-specific orthogonalspreading code by a base station-specific scrambling code; adding apilot block sequence to each spread sequence to produce an output chipblock; padding each of the output chip block with a zero postfix; andperforming a parallel-to-serial conversion on the padded block.